January 22, 2002

GPL Linux virtual machines and virtual machine clustering

Author: JT Smith

- By Grant Gross -

Last week I wrote a roundup of the virtual machine-like technology available for Linux, and an alert reader pointed out the User-mode Linux project.

Think of User-mode Linux (UML), a modification to the Linux kernel that's released under the GNU General Public License, as a cross between the VMware workstation product that allows users to run Linux and Windows side-by-side and larger virtual machine-type products that allow dozens of Linux copies to run on one server. Jeff Dike, leader of the project, says users have reported running as many as 50 virtual machines on one piece of hardware.

Here's Dike's description of a virtual machine, probably better than I can explain it, from an article published in Linux Magazine: "(Virtual machines) offer the ability to partition the resources of a large machine between a large number of users in such a way that those users can't interfere with one another.
Each user gets a virtual machine running a separate operating system with a certain amount of resources assigned to it. Getting more memory, disks, or processors is a matter of changing a configuration, which is far easier than buying and physically installing the equivalent hardware."

UML isn't the only Open Source project working on virtual machines or related technology. There are a couple of other projects released under the GNU GPL, unlike the mostly commercial and proprietary VM technologies I featured in the first article. Among the Open Source alternatives:

The FreeVSD project and its commercial counterpart Idaya market a Web-hosting platform that allow multiple virtual servers to be created on a single hosting server. The FreeVSD project's goals include this one: "To establish and support FreeVSD as the standard for global Web hosting whilst keeping it free from the constrictions and limitations of closed source software." Idaya offers ProVSD, while version 1.4.9 of FreeVSD is available for download here.

The Plex86 project has the goal of creating "an extensible open source PC virtualization software program which will allow PC and workstation users to run multiple operating systems concurrently on the same machine." This allows users to run Wndows software in Linux, much like VMware's workstation product. It's being developed under the LGPL.

The vserver project works within the Linux kernel to allow users to "run general purpose virtual servers on one box, full speed," according to project leader Jacques Gelinas. Vserver is also released under the GNU GPL.

I asked UML's Dike about his progress on the project and what's next for it. One interesting idea he has is to use UML for clustering, a concept it took me a while to get my head around. Our email conversation follows. For more information, check out the project's extensive Web site, which includes case studies of UML being used in the real world, a list of uses for UML and screen shots of UML in action

NewsForge: How long have you been working on the project?

Dike: It depends on when you believe the project started. I started
thinking about the feasibility of a userspace port in late '98. I decided that there were probably no fundamental problems with the idea, and started writing code in early February '99. The first public sign of UML was my announcement on the kernel list in the first week of that June.

NewsForge: What part of the world are you in, and do you have another job besides this project?

Dike: New Hampshire, USA. I'm the CTO of a startup (addtoit.com) ... I'm doing some contracting on the side.

NewsForge: Any idea of how many users or downloads your project has?

Dike: No idea. Here are some random numbers though. :-) Uml-user has (as of Tuesday, Jan. 15) 275 subscribers, uml-devel has (as of Tuesday) 171 subscribers.

SourceForge has just over 70,000 downloads listed for me. However,
they have lost track of downloads at points in the past. Also,
UML is available from other mirrors, several other projects are
distributing UML, and it was in the 2.4.x-ac pool, which can be
downloaded from everywhere. So, 70,000 is probably a gross
underestimate, and I have no idea what would be more accurate.

For some reason, there are now hundreds of downloads from SF
every day, which is up drastically from a month or so ago. The
interesting thing is that page views have not increased similarly.

NewsForge: How many other developers are working on UML?

Dike: Basically, the project is me. Of course, I've had important contributions
from other people and I don't want to downplay them, but I'm the
only person doing work in the core UML code.

NewsForge: Explain how UML works -- it looks like it works kind of like VMware's workstation product, in that you can run different distributions side
by side. Is that a fair comparison?

Dike: The overall effect is the same as VMware. You can boot up multiple
Linux virtual machines on a single host.

The design is radically different. VMWare is a hardware x86 emulator which
can (in principal) boot any x86 OS kernel. UML is a port of Linux to Linux.
So, UML can only be a Linux guest. However, UML can run on any platform that
Linux runs on (such a port needs some work, and UML somewhat runs on ppc), in
contrast to VMWare being restricted to x86.

NewsForge: How many VMs can you run at once?

Dike: I frequently run three to four copies of UML on my laptop (256M, 750 MHz PIII). There's a case study on the UML site
(http://user-mode-linux.sourceforge.net/case-studies.html) describing a 20-node virtual UML network running on a fairly modest PC. I've heard from other people who have run dozens of copies of UML on a single host -- the highest I've heard of is around 50.

NewsForge: Doesn't running four or five instances of UML on a laptop leave precious little RAM/other resources for each?

Dike: The default "physical" memory size for UML is 32M, which will run a fairly decent virtual system. My laptop has 256M in it. 32M * (4 or 5) = 128M or 160M. That leaves plenty of room for other things. I've never noticed multiple instances of UML causing a resource drag. Given the fact that other people have run dozens of UML instances on machines not too different from my laptop, I'd say that I'm not close to pushing any limits when I run four or five.

NewsForge: What's next for the project?

Dike: I'm currently concentrating on killing bugs and adding little bits of missing functionality so I can consider it stable and functional enough to say that
it has reached version 1.0. That will be a stable, robust, functionally
complete virtual machine.

After that, there are a number of very interesting clustering possibilities
for UML. There are a number of Linux clustering projects happening now,
and they will probably end up using UML as their development base, just
as many kernel hackers are using UML for development now. These clusters
are ultimately intended to be implemented as clusters of physical machines.
However, virtual clusters would be interesting in their own right. A UML
cluster running on multiple hosts running different OSes could provide its
processes transparent access to the combined resources of its hosts. Imagine
Apache inside a UML cluster having access to a MySQL database on its Linux
host available as a filesystem inside UML, to the database engine on its
OS/400 host, and to apps on its Windows host.

NewsForge: So you have one machine with several VMs on it connected to a cluster
-- and so the cluster can have access to any of the virtual machines on that machine? What's the advantage to this?

Dike: No, you'd spread a virtual cluster over multiple hosts. It would look
like a single UML, but it would have multiple virtual processors and each
of them would be running on a different host.

In the example I gave, there would be three hosts, running Linux, OS/400, and
Windows, and there would be a single UML running on all of them, just as a
cluster of physical machines runs a single kernel on multiple boxes.

NewsForge: Wouldn't a virtual cluster in essence just have the computing
resources of that one machine?

Dike: No, because it would a single UML instance spread over multiple hosts. So it would have access to the combined resources of those machines.

NewsForge: So people are actually using UML for more than applications testing? It sounds like people are using UML to do the mainframe style of VM things, running multiple copies of Linux on one machine doing different functions.

Dike: Kernel development is probably the biggest use of UML right now. A number of people are using it to build virtual networks, for educational purposes
and for testing (e.g. the FreeS/WAN people are using UML as their testbed).

Others are using it to jail services like bind and sendmail. That adds an
extra layer of security for services that have a history of exploits.

I've heard from a number of ISPs who are interested in using UML to offer
virtual colocation. I don't know of any that have put it into production

In addition, there are lots of people who find it convenient to be able
to fire up another Linux box whenever they want. They have all kinds of
different reasons, i.e.

  • playing with new kernels
  • playing with new distributions
  • setting up and testing new services
  • maintaining packages that require a whole system to test (i.e. Rpm)

NewsForge: What is the ultimate goal for the project?

Dike: I'm not sure it has an ultimate goal. Obviously, I'd like to see virtual
machines be a standard fixture in server rooms everywhere, not just server
rooms that have S/390s in them. And obviously, I'd like those virtual
machines to be UMLs.

After that happens, I'm going to start looking for UML clusters to start
taking over the world ...


  • Linux
Click Here!