January 11, 2007

How to run Linux inside Linux with User Mode Linux

Author: Marc Abramowitz

User Mode Linux (UML) allows you to run Linux kernels as user mode processes under a host Linux kernel, giving you a simple way to run several independent virtual machines on a single piece of physical hardware. Let's take a look at UML and how it can give you more bang for the hardware buck, or make it easier to debug the kernel.

Under UML, each of the virtual machines can run its own selection of software, including different distributions of Linux and different kernels. This gives you the ability to have completely customizable virtual machines that are isolated from each other, and from the host machine. Among other things, you can use this technology to secure systems by containing vulnerabilities, to give developers and sysadmins private sandboxes for development and testing, and to debug problems in kernels using familiar userspace utilities, such as gdb.

Trying out UML

Taking UML for a quick spin is easy using the "Getting Started" instructions at the new UML Home Page. The instructions show you how to download and run a precompiled UML guest kernel and a 1.6GB (80 MB compressed) Fedora Core 5 root file system.

It's easy to get started, as you don't need any special host kernel support, unlike other virtualization software such as Linux-VServer and Xen. However, as you'll see later on in this article, you might want to use a host kernel compiled with the SKAS3 patch if you're going to be running UML on an ongoing basis. Let's look at the sample command from the "Getting Started" instructions:

./linux-2.6.19-rc5 ubda=FedoraCore5-x86-root_fs mem=128M

Notice that you're running the UML guest kernel (linux-2.6.19-rc5) just as you'd run any other process from the command line. Be aware that the UML guest kernel is compiled specially so that it can be run from the command line. If you try to run an ordinary Linux kernel from the command line, it won't work. I'll show you later how to build a UML guest kernel.

The mem parameter simply specifies the amount of RAM that the virtual machine will have. Note that this can exceed the amount of memory on the physical machine, as the host kernel can use virtual memory to supply memory to the virtual machine.

However, since UML kernels run as normal processes on the host, you are limited by the size of the process address space for the architecture of your host machine. For x86, this is typically 3GB.

The ubda parameter in this command is giving the UML kernel the name of a file to use to create the virtual machine's /dev/ubda virtual block device, which will be its root filesystem. The /dev/ubda virtual block device is the first block device of the virtual machine and is analogous to the /dev/hda physical block device of the host Linux system.

In this case, you're setting /dev/ubda to be a virtual block device with data from a file containing a Fedora Core 5 root file system. This is the file system that the UML system will boot from. You can specify several block devices and they need not be file systems. For example, you might want to create a /dev/ubdb device and have it be a swap partition. To do this, you'd create a file on the host system before running your UML kernel:

dd if=/dev/zero of=swap bs=1M count=128

This command creates a file with a size of 128 MB. Then you'd use the ubdb parameter while invoking the UML kernel:

./linux-2.6.19-rc5 ubda=FedoraCore5-x86-root_fs ubdb=swap mem=128M

When the virtual machine boots up, you'll notice that you have two block devices:

[root@localhost ~]# ls -l /dev/ubd*
brw-r----- 1 root disk 98,  0 Dec 14 02:19 /dev/ubda
brw-r----- 1 root disk 98, 16 Dec 14 02:19 /dev/ubdb

To set up /dev/ubdb as swap space, and confirm that it worked, you'd do the following:

[root@localhost ~]# mkswap /dev/ubdb
Setting up swapspace version 1, size = 134213 kB
[root@localhost ~]# swapon /dev/ubdb
[root@localhost ~]# swapon -s
Filename                                Type            Size    Used    Priority
/dev/ubdb                               partition       131064  0       -1

If you change your mind and decide to instead use /dev/ubdb as an ext3 file system, you could do this:

[root@localhost ~]# swapoff /dev/ubdb
[root@localhost ~]# mkfs.ext3 /dev/ubdb
[root@localhost ~]# mkdir /mnt/ubdb
[root@localhost ~]# mount /dev/ubdb /mnt/ubdb
[root@localhost ~]# ls -l /mnt/ubdb
total 12
drwx------ 2 root root 12288 Dec 14 02:22 lost+found

Virtual block devices can also be pointed at physical block devices on the host system. For example, you could specify ubdb=/dev/cdrom while invoking the UML kernel to be able to access a CD-ROM drive on the host system within the virtual machine.

It is also possible to specify a tar file on the host system as a block device in the virtual machine. You could specify ubdb=hello.tar while invoking the UML kernel and then use the command tar -xf /dev/udbd inside the virtual machine to extract the contents of the tar file. This is one simple way of transferring files from the host to the virtual machine.

COW: Copy-On-Write

Imagine that you have several very similar virtual machines. You'd like for them to all use the same root file system image, as this would save a lot of space. If you think about it, this is not a good idea, as all of the machines could write to this shared image and cause problems for the other machines. So you'd like each virtual machine to have its own file system that it can write to, but this would waste a lot of disk space, since you'd be making separate copies of the file for each machine. Furthermore, it is probable that the files are very similar to each other and have only minor differences.

UML solves this problem with a feature called Copy-on-write (COW). The basic idea is that you can create a virtual block device from two files: one that is read-only and contains all of the shared data, and another that is read-write and stores all of the private changes. By specifying ubdb=cowfile,sharedfile when invoking the UML kernel, you're creating a /dev/ubdb device that writes changes to a file called cowfile, while using the file sharedfile for the much larger shared read-only data. One important thing to keep in mind is that the cowfile is sparse; at first glance, it may appear to be just as large as the sharedfile, but if you use ls -ls, you'll see that it's not really as big as it first seems:

host% ls -lsh cowfile
2.9K -rw-r--r-- 1 marc marc 1.6G Dec 22 16:58 cowfile

At first glance, the size of the file looks to be 1.6GB, but if you look at the first column, you'll see that the number of bytes occupied on disk is in fact only 2.9KB. I haven't done any tests to measure the relative performance of accessing a COW file system, but in theory, using COW is likely to lead to increased performance, since the host system can save physical memory by not caching two copies of the same data.

For more details on COW, see the Sharing Filesystems between Virtual Machines page on the UML Web site.

Using hostfs to access files on the physical host from the virtual machine

UML provides a few ways for virtual machines to access files on the physical host. The simplest method is to use hostfs. Running the following command within the guest host makes the entire host file system available as /host in the virtual machine:

mount -t hostfs none /host

Here's a slightly more complex example that shows how to mount just one particular directory on the host file system.

mount -t hostfs none /home/marc -o /home/marc

The hostfs method does have some limitations, and there's another method called humfs that solves some of these problems. For more details on hostfs and humfs, see the Host File Access page on the UML Web site.

Building your own UML guest kernel

So far I've only shown you how to use UML by downloading a precompiled UML guest kernel. You can also build your own UML guest kernels, and it's no more difficult than compiling a host Linux kernel if you're using a recent kernel version.

You can download Linux kernel source code for the Linux kernel of your choice from kernel.org, or you can get the kernel source code by installing the relevant packages from a Linux distribution of your choice. Note that it is easier to build UML guest kernels using source code from Linux kernel versions 2.6.9 and later, because kernels 2.6.9 and later come with built-in UML support.

For older versions of the kernel, you'll need to download and apply appropriate patches. I recommend going with a more recent version of the kernel if at all possible. The downloaded kernel source code can be unzipped and untarred as usual. The main difference in compiling a UML guest kernel is that you must add ARCH=um to your make commands. For example, here's a typical sequence of commands for setting up a default UML kernel configuration, customizing it, and then building the UML kernel:

make defconfig ARCH=um
make menuconfig ARCH=um
make ARCH=um

When the kernel build is complete, you will have a large executable file called "linux". This file is large, because it includes debugging symbols. If you don't anticipate wanting to use a debugger and you want to shrink the size of the kernel, you can strip the debugging symbols with the strip command. To run your new UML kernel, just execute it with ./linux providing a ubda= parameter with a file containing the root file system, as shown previously. For example:

./linux ubda=FedoraCore5-x86-root_fs

For more details on building UML kernels, see the Building from source page on the UML Web site.

The Separate Kernel Address Space (SKAS) patch

Even with early versions of UML, you could run guest kernels without any special support in the host kernel by using something called Tracing Thread (TT) mode. TT mode, however, has disadvantages that impact the security and performance of UML. To address these issues, UML author Jeff Dike implemented patches to the host kernel that allow UML to run in a superior mode called Separate Kernel Address Space (SKAS) mode, now known as SKAS3 mode.

For a while, TT mode and SKAS3 mode were the only choices, and SKAS3 mode was widely regarded as the superior of the two. Thus, if you were going to be doing serious work with UML, you were best served by building for yourself a host kernel with what is now known as the SKAS3 patch. TT and SKAS3 mode are further described at the skas mode page on the UML Web site.

More recently, Dike developed a patch for UML guest kernels called SKAS0, which allows UML guest kernels to have better security and performance than with TT mode, and without the need to patch the host kernel. SKAS0 support is included in Linux kernels from version 2.6.13 onward, so if you're using a modern kernel, you are likely getting the benefits of running your UML guest kernels in SKAS0 mode without having to do anything special. If your UML guest kernel is indeed running in SKAS0 mode, you'll see lines like this near the top of the output when booting your UML guest kernel:

Checking for the skas3 patch in the host:
  - /proc/mm...not found
  - PTRACE_FAULTINFO...not found
  - PTRACE_LDT...not found
UML running in SKAS0 mode

This indicates that UML could not find the features required in the host kernel for SKAS3 mode, and fell back to using SKAS0 mode. This is not so bad, as SKAS0 mode is superior to TT mode, though SKAS3 mode offers even better performance and security than SKAS0 mode, so if you're going to be doing a lot of work with UML, it may be worth your while to build a kernel with SKAS3 support. Dike has mentioned the possibility of a SKAS4 patch, so depending on when you're reading this, it might already be available.

UML is a rapidly evolving technology and so, unfortunately, some of the documentation on the Web site is a bit outdated. In particular, SKAS0 mode doesn't seem to be described on the SKAS mode page at the time that I'm writing this article. Hopefully, by the time you read this, the documentation will be updated.

SKAS0 mode is, however, explained in Dike's book, User Mode Linux and in an email to the Linux Kernel Mailing List.

Creating your own root filesystem

The easiest way to get started with UML is to download prebuilt root filesystems, such as the great filesystems available at http://uml.nagafix.co.uk/. Eventually, though, you may want to create your own root filesystems.

The "Creating your own filesystems" page on the UML Web site describes a number of tools that aid in building root filesystems, such as mkrootfs, UML Builder, gBootRoot, and rootstrap.

Networking your UML guest

At some point, you'll also want to set up networking for your UML guests. You can do this in several ways, the specifics can vary quite a bit depending on your network configuration, but the easiest approach is to create a virtual eth0 interface in the guest which is connected to a TUN/TAP interface in the host.

If your host Linux kernel already has TUN/TAP support (or you have the tun.ko kernel module, loaded by running modprobe tun), then you would create a TAP interface on the host using the UML tunctl utility, and configure it with an IP address on your network using ifconfig.

To set up the guest end, you can add eth0=tuntap,,,<IP of TAP interface on host> to the UML command line, which will result in the creation of an eth0 interface in the UML guest. It is also possible to hot-plug the eth0 interface using the uml_mconsole utility.

With the eth0 interface created in the guest, you will need to assign an IP to the interface (or obtain one via DHCP) and you will probably want to use the route command to configure a default route through the new interface. Depending on your network configuration, you may also need to enable IP forwarding on the host and/or set up forwarding rules with iptables.

Running X programs on your UML guest

You might be wondering if it's possible to run X programs on UML guests, and the answer is yes. Once you've got networking set up, you can set the DISPLAY environment variable on the guest and run X programs on the guest that display on the host's X server. You might be thinking that this is all well and good, but you'd like to do more than just run a few X programs on the guest; you'd like a full desktop environment, such as GNOME or KDE. Well, this is possible too. You can run the Xnest program on the guest and have that display to the host's X server. Xnest acts as a client of the host's X server while also acting as an X server for programs on the guest machine. So for example, you could do something like this to run a GNOME desktop on the guest that you can see in a window on the host display:

%guest Xnest :1 &
%guest gnome-session --display=:1 &


UML allows you to create completely independent virtual machines that run isolated from each other, complete with their own Linux kernels. You could use UML to increase security by completely isolating server processes from each other and from the physical machine. Or you could use it to mimic other development or production environments, even giving users root access to the virtual machine. You could also use it to try out new software without having to worry about it wrecking your system.