Xen 3.0 and the Art of Virtualisation
Ian Pratt of the University of Cambridge described features both in the upcoming version 3.0 release of the Xen virtualisation system, and of virtualisation more generally. Xen's current stable release is 2.4. I walked away with a better understanding of virtualisation than I previously had.
Virtualisation, Pratt explained, is a single operating system image creating the appearance of multiple operating systems on one system. In
essence, it is chroot on steroids. Full virtualisation is the comprehensive emulation of an existing system.
Para-virtualisation is similar, but in this scenario, a guest operating system running on top of a real operating system is aware that it is not in actual control of the computer and is only a virtual machine. Xen and User-mode Linux both fall under this category of virtualisation.
The x86 architecture common to most desktop computers today is not
designed for virtualisation, and Pratt described it as a bit of a pig to
work with for it.
Pratt asked the question, "why virtualise?" and provided fairly
straightforward answers to the question.
Many data-centres have hundreds or thousands of machines running single
operating systems, often each running a single piece of software or
service. With virtualisation, each one of those machines can host several
operating systems, each running their own set of services, and thus
massively reduce the amount of hardware needed for the operation.
Xen takes this one step further and allows clusters of virtual machine
hosts with load balancing and fail-over systems.
Pratt explained that if a Xen virtual machine host in a Xen cluster
detects imminent hardware failure, it can hand off its virtual machine
guest operating systems to another node and die peacefully, without taking
the services it was hosting with it. Meanwhile, people using the services
may not even be aware that anything changed as they would continue more or
Using the same principal, the Xen virtual machine hosting clusters allow
load balancing. If several virtual machines are running across a few
hosts, the host cluster can transfer busier virtual machines to less busy
hosts to avoid overloading any one node in that cluster. This allows an
even higher number of virtual machines to run on the same amount of
hardware and can serve to further reduce hardware costs for an
Within a virtual machine host server, each virtual machine should be
contained, explained Pratt, to reduce any risk should a virtual machine
become infected with malicious software or otherwise suffer some kind of
problem to other virtual machines on the same server.
In order to run Xen, only the kernel needs replacing. No software above
that has to be aware of its new role as a slave operating system within a
larger system. Xen currently works with Linux versions 2.4, 2.6(.12),
OpenBSD, FreeBSD, Plan 9, and Solaris at this point. Because guest kernels
have to communicate with hardware long any other kernels, they must be
patched to be aware of their parent operating system and talk to it
through Xen. A guest kernel attempting to make direct contact with the
hardware on the system will likely fail.
Modifications to the Linux 2.6 kernel to make it work with Xen were
limited to changes in the arch/ kernel source subdirectory, claimed Pratt.
Linux, he said, is very portable.
Virtualised kernels have to understand two sets of times, while normal
kernels only have to be aware of one, noted Pratt.
A normal kernel that is not in a virtual machine has full access to all
the hardware at all times on the system. Its sense of time is real. A
second going by in kernel time is a second going by on the clock on the
wall. However, when a kernel is being virtualised, a second going by for
the kernel can be several seconds of real time as it is sharing the
hardware with all the other kernels on that same computer. Therefore a
virtualised kernel must be aware of both real wall clock time, and virtual
processor time - the time which it has actual access to the hardware.
Among the features coming in Xen 3.0 is support for X86_64 and for SMP
systems. Coming soon to a Xen near you is the ability for guest kernels to
use virtual CPUs up to a maximum of 32 per system (even if there are not
that many real CPUs!) and add and remove them while running, taking hot
swapping to a whole new virtual level.
While I do not fully understand memory rings, perhaps someone who does can
elaborate in comments, Pratt explained how Xen runs under 32-bit x86
versus 64-bit x86 in the context of memory rings. In X86_32, Xen runs in
ring 0, the guest kernel runs in ring 1, and the user-space provided to
the virtual machine runs in ring 3. In X86_64, Xen runs in ring 0 and the
virtual machine's user-space runs in ring 3, but this time, the guest
kernel also runs in ring 3 because of the massive memory address space
provided by the extra 32 bits. With 8 terabytes of memory address space
available, Xen can assign different large blocks of memory using widely
separate addresses where it would be more constrained under the 32 bit
The goal of the SMP support system in Xen is to make it both decent and
secure. SMP scheduling, however, is difficult. Gang scheduling, where
multiple jobs are sent to multiple CPUs at the same time, said Pratt, can
cause CPU cycles to be wasted, and so processes have to be dynamically
managed to maintain efficiency.
For memory management, Pratt said, Xen operates differently from other
virtualisation systems. It assigns page-tables for kernel and user-space
in virtual machines to use, but does not control them once assigned. For
discussion between kernel-space and user-space memory, however, requests
do have to be made through the Xen server. Virtual machines are restricted
to memory they own and cannot leave that memory space, except under
special, controlled shared memory circumstances between virtual machines.
The Xen team is working toward the goal of having unmodified, original
kernels run under Xen, allowing legacy Linux kernels, Windows, and other
operating systems to run on top of Xen without knowing that they are
inside a virtual machine. Before that can happen though, Xen needs to be
able to intercept all system calls from the guest kernels that can cause
failures and handle them as if Xen is not there.
Pratt returned to the topic of load balancing and explained the process of
transferring a virtual machine from one host in a Xen cluster to another.
Assuming two nodes of a cluster are on a good network together, a 1GB
memory image would take 8 seconds in ideal circumstances to transfer to
another host before it could be resumed. This is a lengthly down-time that
can be noticed by mission critical services and users, so a better system
had to be created to transfer a running virtual machine from one node to
The solution they came up with was to take ten percent of the resources
used by the process moving to transfer it to its new home, thus not
significantly impacting its performance in the meantime. The entire memory
block in which the virtual machine is operating is then transferred to its
new home -- repeatedly. Each time, only those things in memory which have
changed since the last copy are transferred, and because not everything
changes, each cycle goes a little bit faster, and fewer things change.
Eventually, there are so few differences between the old and new host's
memory for the virtual machine that the virtual machine is killed off, the
last changes in memory are copied over, and the virtual machine is
restarted at its new location. Total down-time in the case of a busy
webserver he showed statistics for was on the order of 165 milliseconds,
after approximately a minute and a half of copying memory over in
A virtual machine running a Quake 3 server while grad students played the
game managed the transition with down-time ranging from 40 to 50
milliseconds, causing the grad students to not even be aware that any
changes were taking place.
Pratt said that the road-map for Xen 3.1 sees improved performance,
enhanced control tools, improved tuning and optimisation, and less manual
configuration to make it work.
He commented that Xen has a vibrant developer community and strong vendor
support which is assisting in the development of the project.
Intel architect Gordon McFadden ran another virtualisation-related talk in
the afternoon entitled: "Case study: Usage of Virtualised GNU/Linux to
Support Binary Testing Across Multiple Distributions".
The basic problem that faced McFadden was that he was charged with running
multiple Linux Standard Base tests
on multiple distributions on multiple platforms, repeatedly, and could not
acquire additional hardware to perform the task.
He described the LSB tests as time consuming, taking up to eight hours
each, but not hard on the CPU. The logical solution was to run the tests
concurrently using virtual machines. As a test was launched and set under
way on one virtual machine on a real machine, instead of waiting for it to
finish all day or for several hours, another test could be launched in
another virtual machine on the same machine. McFadden's virtual machine of
choice for the project was the User-Mode Linux (UML) virtual
The setup McFadden and his team used was the Gentoo Linux distribution
riding on top of kernel 2.6.11 and an XFS file-system. His reasoning for
using Gentoo was not philosophical, but simply that he had not used it
before and wanted to try something new. The file-systems of the virtual
machines were ext2 or ext3, but appeared to the host system as flat files
on the XFS file-system.
The tests were run on a 4GHz hyper-threaded system with 1GB of RAM, and
tested Novell Linux Desktop 10, Red Hat Enterprise Linux 3 and 4, and Red
Flag Linux. Each test case ran on 8GB virtual file-systems and were
assigned either 384 or 512MB of RAM.
To setup the systems they were installed normally and dd'ed into flat
files to be mounted and used by the UML kernel.
The guest kernels were instantiated, loaded, and popped an X-term for
management. Each test could then be run by logging into the x-term,
starting NFS on the guest system, and running a test.
The result of the whole processes was a quickly reusable hardware platform
that was economic both fiscally and in lab and desk space, though McFadden
did not relate the results of the LSB tests themselves.
Using virtual machines for testing has limitations as well, McFadden
noted. For one, it can not be used to test hardware, and resource sharing
can sometimes become a problem. For example, if two kernels are vying for
control of one network interface, performance will be below par for both.
McFadden said he had alternatives to using virtualisation to run his
tests, but using boot loaders to continually be loading different
operating systems meant it would have taken a lot longer with long delays
when multiple tasks could not be performed at the same time. His other
alternative of using vmware was to be avoided as he was already familiar
with vmware and wanted to learn something new.
More on page 2...
Following a brief thirty minute interlude that passed for dinner hour, BOF
sessions began for the evening. Among those that I attended was one
entitled "Debian Women: Encouraging Women Without Segregation" hosted by
Felipe Augusto van de Wiel (not a woman, incidentally).
The Debian-Women project started around DebConf 4 following a Debian
Project Leader (DPL) election debate question around how the DPL hopefuls
would handle attracting more women to the Debian project. The question
enticed a lengthly mailing list debate, as nearly anything in Debian can,
at the end of which a new group was born called Debian-Women, with its own
website by the same name.
Some research into open source projects found that the highest percentage
of women in a major project appeared to be about 1.6%. At the time of the
start of the Debian-Women project there were just 3 female Debian
developers, but in the year since there have been 10 added to the New
Maintainer Queue (NMQ, in Debian lingo).
Van de Wiel made the point repeatedly through the session that the
Debian-Women project is inclusive of men and not an exclusive club. Their
list and IRC channel provides a good place for people seeking help to get
it, regardless of gender.
The Debian-Women project's goal is to encourage and educate the Debian
community on the topic of equality and encourage women to volunteer in the
free software community.
Van de Wiel explained that he was running the session rather than one of
the Debian women as many of them are currently at DebConf 5 in Helsinki,
Finland and could not attend OLS this year.
The discussion touched on a recent flap at Debian over a package called
hotbabe, which featured an animated woman taking off a percentage of her
clothes based on the activity of the system's CPU, until, at 100%, she was
completely naked. Some complained that there was no option to have the
virtual stripper be male and after a lengthly flame-war on the Debian
mailing lists, the project was eventually dropped as not providing
anything new that Debian needed to the Debian project.
The point of this discussion though was the lack of awareness of males in
the community to the sensitivities of the women around us. These actions
don't serve to encourage female participation in the development process.
An issue in a similar vein to this one is the issue that a good deal of
documentation in Debian refers to hypothetical developers as a male,
rather than in a gender-neutral sense, further adding to the implicit bias
found in the development community.
Van de Wiel went on to discuss some of the things women in Debian are now
doing, including working on translations into 8 languages for the
project's own website and the assistance being provided to Debian Weekly
Outside of Malaysia, where it was pointed out around 70% of IT workers are
female, there is a general cultural bias in favour of males in the field.
One attendee noted that a recent study in the US found that American
families typically spend four times more on their male children as their
female children on IT-related investment.
Another point made is that guys tend to enjoy studying Linux in their free
time, perhaps instead of their homework, while women tend to follow their
curriculum more precisely and thus are more likely to be familiar with a
Ultimately, more can be done to encourage more female developers to join
the community, as they are certainly out there.
The final session I attended on Friday was a BOF session led by Russell
McOrmond on the topic of Canadian copyright law, entitled simply
"GOSLING/Canadian copyright update".
GOSLING stands for "Get Open Source Logic Into Governments".
To start, McOrmond suggested Canadians in the room who have not yet done
so sign a petition
on the topic of copyright law in Canada asking the Canadian government not
to damage copyrights with a law they are proposing. He suggested that if
MPs receive signatures on a petition on an issue like this, they may
realise that there are actually Canadians who care about these issues
other than the business people who stand to profit from them.
Bill C-60, currently before the
House, would cause the author of software to be legally liable for
copyright violations carried out with the help of the software they have
written. It would give copyright ownership to people who take pictures,
regardless of the circumstances, including giving the copyright of a
picture of tourists taken by a friendly passer-by being handed a camera to
that passer-by. Photos contracted to be taken would remain under the
copyright of the photographer who took them. The act to amend the
copyright act, bill C-60 is 30 pages, translated, and amends the 80-page
Canadian Copyright Act currently in effect.
McOrmond noted that IBM has a lawyer in Canada named Peter K. Wang
actively fighting at the Canadian government for software patents in this
country. He suggested that an internal debate needs to take place at IBM
about whether or not they actually support software patents, especially as
some IBM employees at the conference had earlier expressed their
displeasure with the concept.
McOrmond referred to several URLs people interested in the copyright issue
in Canada should refer to: flora.ca/A246, goslingcommunity.org, www.cippic.ca, www.creativecommons.ca, www.forumonpublicdomain.ca,
www.efc.ca, www.digitalsecurity.ca, and www.softwareinnovation.ca.
Some American sites that deal with similar issues he listed are: www.eff.org, www.ffii.org, www.centerpd.org, and www.pubpat.org.
A point McOrmond made a number of times is that Canadian copyright law is
being influenced by a large subset of business-people in the
copyright-concerned community who would prefer that the Internet not
exist. But with the Internet clearly here to stay, we should be working on
ways to deal with copyright in a way that is beneficial to as many
Canadians as possible, not just a few.
The province of Quebec has long been a stronger defender of its culture
than most of the rest of Canada and McOrmond suggested it would be
beneficial to the case of killing bill C-60 if the province of Quebec and
its dominant party in the Canadian parliament, the Bloc Québecois if they
realised that the choices they are facing is between the copyright system
we know and the one we see in the United States. Quebec is usually the
first to act on this kind of thing and it may need to before the rest of
the country catches on.
A caution McOrmond had for the library community in Canada is that asking
for copyright exemptions for certain circumstances hurts everyone more
than it helps the libraries. As one example, allowing libraries to
exchange copyrighted information electronically as long as the information
self-destructs after a set amount of time would require running on a
platform that would enforce that self-destruction, and likely lock the
library system into a version of Windows capable of the task.