November 4, 2015

Open Source Security Process Part 2: Containers vs. Hypervisors - Protecting Your Attack Surface

Mont-St-Michel-Brittany-2011In part two of this series, Xen Project Advisory Board Chairman Lars Kurth discusses the different security vulnerabilities of containers and hypervisors. Read Part 1: A Cloud Security Introduction.

With virtualization software, every element of every interface is an opportunity to make a mistake. Every piece of code that is called from the interface, may introduce a vulnerability. Every corner of that interface is an opportunity for a mistake that will allow an attacker to break through that interface. This is what is called "attack surface.” And IT needs to be concerned about the risk of a vulnerability in the interface to the virtualization software itself and the amount of code that is executed when an interface is called.

Let’s apply how “attack surfaces” are formed given the different functionalities and how they relate and differ in popular cloud technologies.

First Attack Vector: Operating System Functionality

The basic building blocks for virtualization and containers are schedulers and memory management. In the case of Xen, the hypervisor re-implements this functionality entirely from scratch. In the case of KVM functionality exposed by the Linux kernel - augmented by the KVM Kernel Module - is reused to schedule VMs and manage its memory. Under this model, every VM is a Linux process scheduled by the Linux scheduler while its memory is allocated by the Linux memory allocator. The same concept of reuse is also true for containers.

The key difference between KVM and containers is that KVM runs a separate kernel instance within each VM (either different versions of Linux, or entirely different operating system kernels such as Windows) and uses a kernel module and QEMU to implement additional compartmentalization. Containers use the same kernel instance through a single instance of the kernel’s syscall interface to manage containers as well as the applications running within containers.

Although the re-use of the Linux Kernel in containers has many advantages, one of the key trade-offs is the breadth of functionality available via the syscall interface as well as the code behind it. This increases the risk of errors in the code and vulnerabilities that can be exploited compared to Xen, which only implements what is necessary, and KVM, which re-uses kernel functionality within KVM without exposing it directly to guests VMs.

If you compare the Linux kernel syscall interface with the Xen hypercall interface, we are talking an order of magnitude difference in exposed interfaces (>300 in Linux, 40 in Xen). A kernel gives you filesystems with files, directories, seek, fstat, read, mmio, and aio. How many different kinds of sockets can you create? How many different IPC mechanisms are there? Futexes, shared memory, ioctls, TTY -- all of them with lots of internal state and corner cases which have to be handled correctly. There are many more opportunities to make a mistake in the Linux system call interface. And as table 1 below shows, more mistakes are made on a more regular basis as a result. This only adds to the layer of complexity and the attack surface.

Second Attack Vector: Device Emulation

Hypervisors, such as Xen and KVM use QEMU as device model emulator to emulate server hardware components, such as motherboard, timers, interrupt and I/O that are exposed to drivers in the guest’s kernel. Xen only uses emulation in some cases (e.g for HVM on x86, but not for PV on x86 and not on ARM architectures), whereas KVM relies entirely on QEMU (or alternatives providing similar functionality).

Device Emulation is complex for two reasons. Firstly, hardware interfaces trade off ease of hardware implementation against ease of software implementation, and secondly, a large number of devices need to be emulated. This is partly the reason why we have recently seen a surge of vulnerabilities in QEMU following the attention created by the VENOM vulnerability in QEMU.


Linux as a general Container


Xen PV

Privilege Escalation
(Guest to Host)

7 - 9

3 - 5


Denial of Service
(Guest to Host)


5 - 7


Information Leak
(Guest to Host)




Table1: This table shows the result of an investigation of significant vulnerabilities, presented at FOSDEM 2015. Note that a similar investigation has not yet been performed for 2015, as significant effort is required to do this. Part 3 of this series, explains some of the challenges in putting together a fair comparison of vulnerabilities across different technologies. Also note that figures for Xen HVM are in a similar to those of KVM with QEMU.

Third Attack Vector: I/O and Device Drivers

As stated earlier, I/O to and from the system is the primary route for malicious payloads into cloud based systems. Attacks typically try to exploit a vulnerability in device drivers, which give an attacker access to everything in the system because device drivers run in Linux kernel mode. In virtualized environments, a device driver vulnerability may give an attacker control of a single VM, but an additional vulnerability would need to be exploited to gain control of the host in order to attack other VMs running on that host.

As emulation is inherently slow, Xen and KVM use paravirtualized I/O drivers for disk and network access to avoid emulation. Paravirtualized I/O drivers tend to be extremely simple. Contrast this with containers, which use the syscall interface of the single kernel instance to interact with device drivers. As pointed out earlier, the syscall interface is very large -- far larger than PV I/O and emulated I/O combined. This, together with the fact that device drivers run in kernel mode, creates a significantly higher risk to container deployments compared to hypervisor deployments.


Many of the recently developed security features for containers, such as per-container ulimit, capability reduction, device access restrictions, improved handling of Linux Security Modules (SELinux, AppArmor), improved user namespaces and others are all examples of defense-in-depth (multiple layers of protection).

Despite these efforts, the fundamental problem with containers compared to hypervisors still persists as they have an exceedingly large “attack surface” making them inherently vulnerable, despite efforts to use minimal Linux distros. This is why technologies that try to combine virtualization with containers -- such as Hyper, Clear Containers and Xen Containers -- are starting to build momentum.

Read Part 3: Are Today’s Open Source Security Practices Robust Enough in the Cloud Era?

Click Here!