Xen Related Work in the Linux Kernel: Current and Future Plans

June 28, 2017

The Linux kernel contains a lot of code support for Xen. This code isn’t just meant to optimize Linux to run as a virtualized guest. As a type 1 hypervisor, Xen relies a lot on the support of the operating system running as dom0. Although other operating systems can be used as dom0, Linux is the most popular dom0 choice — due to its widespread use and for historical reasons (Linux was chosen as dom0 in the first Xen implementation). Given this, a lot of the work of adding new functionality to Xen is done in the Linux kernel.

In this article, I’ll cover some highlights of Xen related work that has been done in the past year and what’s expected in the near future, as well as few best practices learned along the way. This post will be helpful for anyone who is interested in Xen Project technology and its impact on the Linux kernel.

History of Xen support in the Linux kernel

When the Xen Project was released in 2003, it was using a heavily modified Linux kernel as dom0. Over the years, a lot of effort has gone into merging those modifications into the official Linux kernel code base. And, in 2011, this goal was achieved.

However, because some distributions — like SUSE’s SLE — had included Xen support for quite some time, they had built up another pile of patches for optimizing the Linux kernel to run as dom0 and as a Xen guest. For the past three years, it has been my job to try to merge those patches into the upstream Linux kernel. We finally made it possible to use the upstream kernel without any Xen specific patches as base for SLE in Linux kernel 4.4.

The primary reason for the large amount of patches needed in the Linux kernel for support stems from the primary design goal of Xen. It was introduced at a time when x86 processors had no special virtualization features, and Xen tried to establish an interface making it possible to run completely isolated guests on x86 with bare metal like performance.

This was possible only by using paravirtualization. Instead of trying to emulate the privileged instructions of the x86 processor, Xen-enabled guests had to be modified to avoid those privileged instructions and use calls into the hypervisor when a privileged operation was unavoidable. This, of course, had a large impact on the low-level operating system, leading to the large patch amount. Basically, the Linux kernel had to support a new architecture.

Although they still have some advantages over fully virtualized guests with some workloads, paravirtualized guests are a little bit problematic from the kernel’s view:

The needed pvops framework limits performance of the same kernel running on bare metal.
Introducing new features touching this framework is more complicated than it should be.

With virtualization support in x86 processors available for many years now, there is an ongoing campaign to move away from paravirtualized domains to hardware virtualized ones. To get rid of paravirtualized guests completely, a new guest mode is needed: PVH. Basically, PVH mode is like a fully virtualized guest but without emulation of legacy features like BIOS. Many legacy features of fully virtualized guests are emulated via a qemu process running in dom0. Getting rid of using those legacy features will avoid the need for the qemu process.

Full support of PVH will enable dom0 to run in this mode. dom0 can’t be run fully virtualized, as this would require legacy emulation delivered by the qemu process in dom0 for an ordinary guest. For dom0, this would raise a chicken and egg problem. More on PVH support and its problems will be discussed later.

Last Year with Xen and the Linux Kernel

So, what has happened in the Linux kernel regarding Xen in the last year? Apart from the always ongoing correction of bugs, little tweaks, and adapting to changed kernel interfaces the main work has happened in the following areas:

PVH: After a first test implementation of PVH the basic design has been modified to use the fully virtualized interface as a starting point and avoid the legacy features.

This has led to a clean model requiring only a very small boot prologue used to set some indicators for avoiding the legacy features later on. The old PVH implementation was removed from the kernel and the new one has been introduced. This enables the Linux kernel to run as a PVH guest on top of Xen. dom0 PVH support isn’t complete right now, but we are progressing.

Restructuring to be able to configure a kernel with Xen support but without paravirtualized guest support: This can be viewed as a first step to finally get rid of a major part of the pvops framework. Today, such a kernel would be capable of running as a PVH or fully virtualized guest (with some paravirtualized interfaces like paravirtualized devices), but not yet as dom0.
ARM support: There has been significant effort with Xen on ARM (both 32- and 64-bit platforms). For example, support of guests with a different page size than dom0.
New paravirtualized devices: New frontend/backend drivers have been introduced or are in the process of being introduced, such as, PV-9pfs and a PV-socket implementation.
Performance of guests and dom0: This has been my primary area of work over the past year. In the following, I’ll highlight two examples along with some background information.

Restructuring of the xenbus driver

As a type 1 hypervisor, Xen has a big advantage over a type 2 hypervisor: It is much smaller; thus, the probability of the complete system failing due to a software error is smaller. This, however, is only true as long as other components are no longer a single point of failure, like today’s dom0.

Given this, I’m trying to add features to Xen that disaggregate it into redundant components by moving essential services into independant guests (e.g., driver domains containing the backends of paravirtualized devices).

One such service running in dom0 today is the Xenstore. Xenstore is designed to handle multiple outstanding requests. It is possible to run it in a “xenstore domain” independent from dom0, but this configuration wasn’t optimized for performance up to now.

The reason for this performance bottleneck was the xenbus driver being responsible for communication with Xenstore running in another domain (with Xenstore running as a dom0 daemon this driver would be used by guest domains or the dom0 kernel accessing Xenstore only). The xenbus driver could only handle one Xenstore access at a time. This is a major bottleneck because, during domain creation, there are often multiple-processes activity trying to access Xenstore. This was fixed through restructuring the xenbus driver to allow multiple requests to the Xenstore without blocking each other more than necessary.

Finding and repairing a performance regression of fully virtualized domains

This problem kept me busy for the past three weeks. In some tests, comparing performance between fully virtualized guests with a recent kernel and a rather old one (pre-pvops era) showed that several benchmarks performed very poorly on the new kernel. Fortunately, the tests were very easy to set up and the problem could be reproduced really easily, for example, a single munmap() call for a 8kB memory area was taking twice as long on the new kernel as on the old one.

So as a kernel developer, the first thing I tried was bisecting. Knowing the old and the new kernel version, I knew Git would help me find the Git commit making the performance bad. The git bisect process is very easy: you tell Git the last known good version and the first known bad version, then it will interactively do a binary search until the offending commit has been found.

At each iteration step, you have to test and tell Git whether the result was good or bad. In the end, I had a rather disturbing result: The commit meant to enhance the performance was to blame. And at the time, the patch was written (some years ago), it was shown it really did increase performance.

The patch in question introduced some more paravirtualized features for fully virtualized domains. So, the next thing I tried was to disable all paravirtualized features (this is easy doable via a boot parameter of the guest). Performance was up again. Well, for the munmap() call, not for the rest, (e.g., I/O handling). The overall performance of a fully virtualized guest without any paravirtualization feature enabled is disgusting due to the full emulation of all I/O devices including the platform chipset. So, the only thing I learned was that the paravirtualization features enabled would make munmap() slow.

I tried modifying the kernel to be able to disable various paravirtualized features one at a time hoping to find the one to blame. I suspected PV time handling to be the culprit, but didn’t have any success. Neither PV timers, PV clocksource, nor PV spinlocks were to blame.

Next idea: using ftrace to get timestamps of all the kernel functions called on the munmap() call. Comparing the timestamps of the test once run with PV features and once without should show the part of the kernel to blame. The result was again rather odd; the time seemed to be lost very gradually over the complete trace.

With perf I was finally able to find the problem: It showed a major increase of TLB misses with the PV features enabled. It turned out that enabling PV features requires mapping a Xen memory page into guest memory. The way this was done in the kernel required the hypervisor to split up a large page mapping into many small pages. Unfortunately, that large page contained the main kernel page tables accessed (e.g., when executing kernel code).

Moving the mapping of the Xen page into an area already mapped via small pages solved the problem.

What’s to Come

The main topics for the next time will be:

PVH dom0 support: Some features like PCI passthrough are still missing. Another very hot topic for PVH dom0 support will be performance. Some early tests using a FreeBSD kernel being able to run as PVH dom0 domain indicate that creating domains from a PVH kernel will be much slower than from a PV kernel. The reason here is the huge amount of hypercalls needed for domain creation. Calling the hypervisor from PVH is an order of magnitudes slower than from PV (the difference between VMEXIT/VMENTER and INT/IRET execution times of the processor). I have already some ideas on how to address this problem, but this would require some hypervisor modifications.

Another performance problem is backend operation, which again suffers from hypercalls being much slower on PVH. Again, a possible solution could be a proper hypervisor modification.

There are several enhancements regarding PV-devices (sound, multi-touch devices, virtual displays) in the pipeline. Those will be needed for a project using Xen as base for automotive IT.

This topic will be discussed during the Xen Project Developer and Design Summit happening in Budapest, Hungary from July 11 to 13. Register for the conference today.

Cloud Foundry’s Abby Kearns Talks Inclusion, Interfaces

The New Stack

June 28, 2017

At the end of an action-packed Cloud Foundry Summit Silicon Valley 2017 earlier this month, Abby Kearns, the Cloud Foundry Foundation executive director, sat down with TNS founder Alex Williams to ponder a very challenging year ahead. Kubo, the platform’s new lifecycle manager that integrates Kubernetes, is now production-ready. And while you’d expect such a move to draw attention and participation from Google, Microsoft coming closer into the fold — as the Foundation’s newest gold-level member, changes the ball game somewhat.

Listen now to “Look For More Open Source Extensions With Cloud Foundry This Year,” on The New Stack Makers podcast.

Is Blockchain the Land of Milk and Honey? 9 Experts Share their Concerns

Jaxenter

June 28, 2017

Can blockchain transform the world? It is already doing that and, according to Chitra Ragavan, the Chief Communications Officer at Gem, a Los Angeles-based blockchain startup and one of our influencers, “blockchain technology has the potential to be transformative not only in the EU but throughout the world in coming years.” What does that mean for us?

We invited nine influencers to weigh in on the facets of the blockchain and explain why the industries that make the world go round see tremendous potential in this technology. This series consists of four parts that dissect the purposes and benefits of the blockchain and shed some light on the main concerns and obstacles.

In the first part of this interview series, we invited our blockchain influencers to talk about the blockchain’s impact on our lives and to weigh in on the importance of the legal factor in the blockchain’s healthy development.

Now it’s time to ask them about their concerns, the advantages of this technology, the obstacles to experimenting with it and the industries that cannot be disrupted by the blockchain.

What Are Linux Logs? How to View Them, Most Important Directories, and More

D Zone

June 28, 2017

Logging is a must for today’s developers, and it’s important to understand Linux logs, how to view them, and which logs are most important to your work. We wrote this mini-guide to give you all the need-to-know essentials in an easily digestible format. It won’t take up your entire lunch break – promise!

A Definition of Linux Logs

Linux logs provide a timeline of events for the Linux operating system, applications, and system, and are a valuable troubleshooting tool when you encounter issues. Essentially, analyzing log files is the first thing an administrator needs to do when an issue is discovered.

What’s New in the Xen Project Hypervisor 4.9?

Xen Project

June 28, 2017

The Xen Project Hypervisor 4.9 release focuses on advanced features for embedded, automotive and native-cloud-computing use cases, enhanced boot configurations for more portability across different hardware platforms, the addition of new x86 instructions to hasten machine learning computing, and improvements to existing functionality related to the ARM® architecture, device model operation hypercall, and more.

We are also pleased to announce that Julien Grall, Senior Software Engineer at ARM, will stay release manager for Xen Project Hypervisor 4.10 release.

We grouped updates to the Xen Project Hypervisor using the following categories

New Features
Improvements to Existing Functionality
Multi-Release Long-Term Development

Your Container Orchestration Needs: Kubernetes vs. Mesos vs. Docker Swarm

Kubernauts

June 28, 2017

On a recent client project, I’ve been asked to compare the 3 main players on the market for container orchestration and cluster management solution, these are Kubernetes, Docker Swarm and Mesos (a.k.a Data Center Operating System). Last year in October a feature comparison matrix was created by the team, which I had to adapt slightly to reflect the current status by the time of this writing in 2017 / June. In this post I’m not going to go through the feature comparison, but rather discuss and ask about your needs to find out which solution(s) might be the right choice for your business.

Depending on the real business needs and the flavor of your developers and operators, it might be that even 2 or even all 3 options have to co-exist together to full-fill a broader range of use cases in larger enterprises.

The reality is that the decision about finding the right container orchestration and cluster management solution for many enterprises depends on their todays and future needs.

What Are the Leading Software Platforms for NFV Infrastructure?

Tech Target

June 28, 2017

As service providers report a number of successful production deployments of network functions virtualization, it is important to consider the infrastructure beneath it all — and the available options. The leading software platforms for NFV infrastructure are OpenStack and VMware’s vCloud NFV.

But service providers can choose from a number of OpenStack options, including sourcing from a supplier or open source internal development.

NFV deployments’ use cases vary widely and include wireless core networks, customer premises equipment, routing, security and the internet of things. During the second half of 2017 and throughout 2018, leading service providers will deploy NFV at scale with additional applications across their networks.

Open Tools Help Streamline Kubernetes and Application Development

Sam Dean

June 27, 2017

Organizations everywhere are implementing container technology, and many of them are also turning to Kubernetes as a solution for orchestrating containers. Kubernetes is attractive for its extensible architecture and healthy open source community, but some still feel that it is too difficult to use. Now, new tools are emerging that help streamline Kubernetes and make building container-based applications easier. Here, we will consider several open source options worth noting.

Microsoft’s Kubernetes Moves

Microsoft has just open sourced Draft, a tool that streamlines application development and deployment into any Kubernetes cluster. “Using two simple commands, developers can now begin hacking on container-based applications without requiring Docker or even installing Kubernetes themselves,” writes Gabe Monroy, PM Lead for Containers at Microsoft. “You can customize Draft to streamline the development of any application or service that can run on Kubernetes.”

In April, Microsoft acquired the Deis container platform from Engine Yard, and Draft is a direct result of that acquisition. “Draft targets the ‘inner loop’ of a developer’s workflow while developers write code and iterate, but before they commit changes to version control,” notes Monroy. “When developers run ‘draft create’ the tool detects the application language and writes out a simple Dockerfile and a Kubernetes Helm chart into the source tree. Language detection uses configurable Draft ‘packs’ that can support any language, framework, or runtime environment. By default, Draft ships with support for languages including Node.js, Go, Java, Python, PHP, and Ruby.”

You can see this process in action here.

In acquiring the Deis container platform from Engine Yard, Microsoft also became a steward, along with the Cloud Native Computing Foundation and several other organizations, of Helm, which is billed as “the best way to find, share and use software built for Kubernetes.” It is essentially an open Kubernetes package manager. “Helm Charts help you define, install and upgrade even the most complex Kubernetes application,” note the community leaders.

The Kubernetes blog notes the following about Helm: “There are thousands of people and companies packaging their applications for deployment on Kubernetes. This usually involves crafting a few different Kubernetes resource definitions that configure the application runtime, as well as defining the mechanism that users and other apps leverage to communicate with the application…We began to provide a home for Kubernetes deployable applications that provides continuous releases of well documented and user friendly packages. These packages are being created as Helm Charts and can be installed using the Helm tool. Helm allows users to easily templatize their Kubernetes manifests and provide a set of configuration parameters that allows users to customize their deployment.”

Red Hat’s New Angle on Kubernetes

Red Hat, too, is positioned to help users streamline their Kubernetes implementations. The company recently announced its intent to acquire San Francisco-based startup Codenvy, which gives developers options for building out cloud-based integrated development environments, including working with Kubernetes and containers. Codenvy is built on the open source project, Eclipse Che, which offers a cloud-based Integrated Developer Environment (IDE) and development environment. The OpenShift.io cloud-based container development service from Red Hat already integrates Codenvy’s Eclipse Che implementation.

In essence, Codenvy has DevOps software that can streamline coding and collaboration environments. According to Red Hat: “[Codenvy’s] workspace approach makes working with containers easier for developers. It removes the need to setup local VMs and Docker instances enabling developers to create multi-container development environments without ever typing Docker commands or editing Kubernetes files. This is one of the biggest pain points we hear from customers and we think that this has huge potential for simplifying the developer experience.”

“The rapid adoption of containers makes orchestration standards the industry’s next step. We held the view that Kubernetes and Red Hat OpenShift are leading the way in this space. So when Red Hat shared their container vision, our decision to join them became a no-brainer,” Codenvy CEO Tyler Jewell said.

The move toward containers shifts many types of dependencies pertaining to applications, and shifts how applications are created. Kubernetes has proven to be an essential orchestration tool as these shifts evolve, and it is good to see open tools arriving that can help streamline Kubernetes itself and make developing applications easier.

To learn more about Kubernetes, check out the sample course materials for Kubernetes Fundamentals (LFS258), an online, self-paced course developed by The Linux Foundation Training that gives a high-level overview of what Kubernetes is and the challenges it solves. Download a free sample chapter now.

openSUSE Leap Is Now 99.9% Enterprise Distribution

Swapnil Bhartiya

June 27, 2017

Two years ago when openSUSE decided to move the base of openSUSE Leap to SUSE Linux Enterprise (SLE), they were entering uncharted territory. SLE is a tightly controlled enterprise ship that runs on mission critical systems. On the other hand openSUSE has been a community-driven project that, despite sponsorship from SUSE, is relatively independent.

It became clear, though, that moving to SLE source code would solve many problems for both members of the SUSE family. SLE would get a platform from where it can borrow the latest fully tested packages, and openSUSE Leap would get enterprise grade code base to move into CentOS and Ubuntu territory. SLE and openSUSE created a symbiotic relationship in which they were pulling content from each other.

Moving closer

“Initially when we moved the base, our utopian vision was to have a 30-30-30 split from SLE, Tumbleweed and openSUSE into Leap,” said Richard Brown, openSUSE chairman.

“The first version of openSUSE Leap (42.1) didn’t have that equilibrium and there was too much replacement of SLES components from the community. With 42.2, we moved closer and there was enough SLE and enough Tumbleweed and we inherited what we wanted from 42.1. But with the upcoming 43 release, we are exactly where we wanted to be. The base comprises SLE, so you have a fully enterprise grade base, then you have fast moving components on top of it that come from Tumbleweed, which allow you to stay updated on a very stable system. The way I look at it is upcoming release of Leap is 99.9 enterprise grade software; it’s our CentOS, just better and broader with the addition of integrated community packages,” he said.

Leap has essentially created a community platform for those developers and sysadmins who run SUSE Linux Enterprise Server (SLES) in their datacenters. The strategy to move codebase to SLES has worked. openSUSE Leap has been a success so far as now even companies like IBM contribute directly to Leap as they know that’s the best and open way to get things into SLES. Fujitsu is shipping Tumbleweed and Leap to their users, according to Brown.

Changing mission statement

Initially openSUSE’s mission statement was to “encourage use of Linux & Open Source everywhere.” But, that’s no longer the heart and soul of openSUSE. OpenSUSE has evolved beyond just a Linux distribution project. They now cater to a totally different audience — developers and sysadmins. So, openSUSE board members drafted a new mission statement: “Openly engineered tools to change your world.” The mission statement is not final yet, but once it’s discussed with the community and everyone is onboard it may become official.

“We work in open, we share our opinion, which changes over time as we learn more or things improve. We work on everything openly. What we do essentially is engineering – we help in building packages, we help in testing and we help in delivering them. We care about the process.” said Brown. “At the same time everything that we do is a tool, OpenQA is a testing tool, OBS is a packaging tool, YaST is system management tool, even our distributions Leap and Tumbleweed are tools.”

openSUSE in Windows land

Microsoft is now bringing openSUSE to Windows users, through its WSL (Windows Subsystem for Linux) initiative. Microsoft and openSUSE projects have finalized all the “paperwork” and Rich Turner of Microsoft confirmed that openSUSE for Windows is in the works.

Brown said there will be two members of the SUSE family in the Windows Store: Leap 42.2 and SLES 12. This means users will be able to install and run command-line utilities from both of these platforms. Although Leap will be available for free, SLES is subscription based. However, SUSE has started a SUSE Developer Program that offers one year free subscription of SLES. Thus, developers have access to thousands of packages, tools, and utilities through either of the two platforms.

Many free software advocates may wonder whether this will affect the user-base of Linux. If developers can access Linux utilities from within Windows, there won’t be any need to install Linux desktop anymore. Brown said, “We are a project that creates tools and it doesn’t matter which platform runs those tools. You can use them on openSUSE or Windows. The idea is to help more people use our tools and get work done,” said Brown. “I think it will actually increase the reach of Linux as now those users who would have never installed Linux will be able to use these tools. Windows has a much larger market share than Linux and these users will now have access to Linux tools.”

Incubating new ideas

As openSUSE is evolving into a project that offers tools, Brown said they are also contemplating a new project called openSUSE Incubator. Since OBS allows developers to create packages and collaborate, over time it may create some discussions around the quality of these projects.

“How do we ensure that these projects that are available through OBS are of openSUSE quality?” asked Brown. There is already an answer to that question: Apache Incubator, a place where Apache Software Foundation incubates new projects.

openSUSE will look at the projects that are not up to their standards and will mark them as Incubator. The idea is to create a fertile and nurturing environment that enables developers to bring their projects to openSUSE and see them grow. As part of the Incubator, projects will get access to OBS build service, infinite bandwidth from openSUSE mirror, and will be hosted on the openSUSE infrastructure and can be consumed by users directly.

However, that also doesn’t mean anyone can “dump” their projects at openSUSE Incubator. Brown is working on some basic guidelines to ensure that the projects at least share the same principles of openness as openSUSE, and that the projects have a few maintainers. The projects will have the option to use openSUSE branding, but Brown stresses that despite being part of openSUSE Incubator, they will remain independent when it comes to branding. Many open source projects can benefit from a project like openSUSE Incubator.

Conclusion

Overall, openSUSE community is heading in the right direction as our computing world is changing. Instead of sticking with the operating system, they are expanding their reach and catering to what developers and sysadmins need.

Connect with the Open Source community at Open Source Summit, September 11-14 in Los Angeles, CA, with over 200 sessions covering everything from Cloud and
Containers, to Security and Networking, to Linux and Kernel Development. Register now & Save $150.

Pivoting To Understand Quicksort [Part 2]

Dev.to

June 27, 2017

This is the second installment in a two-part series on Quicksort. If you haven’t readPart 1 of this series, I recommend checking that out first!

In part 1 of this series, we walked through how the quicksort algorithm works on a high level. In case you need a quick refresher, this algorithm has two important aspects: a pivot element, and two partitions around the pivot.

We’ll remember that quicksort functions by choosing a pivot point (remember, this is only somewhat random!), and sorting the remaingin elements so that items smaller than the pivot are to the left, or in front of the pivot, and items that are larger than the pivot are to the right, or behind the pivot. These two halves become the partitions, and the algorithm recursively calls itself upon both of these partitions until the entire list is divided down into single-item lists. Then, it combines them all back together again.