September 21, 2006

A survey of open source cluster management systems

Author: M. Shuaib Khan

In computing world, the term "cluster" refers to a group of independent computers combined through software and networking, which is often used to run highly compute-intensive jobs. With a cluster, you can build a high-speed supercomputer out of hundreds or even thousands of relatively low-speed systems. Cluster management software offers an easy-to-use interface for managing clusters, and automates the process of queuing jobs, matching the requirements of a job and the resources available to the cluster, and migrating jobs across the cluster. Here's an introduction to five open source CMS applications.


openMosix, probably the most famous open source clustering solution, was started by Moshe Bar in 2002 to extend and provide an open source alternative to the Mosix CMS. openMosix is available as a patch to the Linux kernel which extends the ordinary kernel into a cluster-aware system. openMosix provides single-system image (SSI) clustering, which means that the distributed multiple resources present on the network appears to user applications as as single local resource. Its autodiscovery feature enables it to detect a new node at runtime and start using its resources, which means that a new node can be added to the cluster while openMosix is running.

openMosix uses load-balancing techniques to migrate jobs from a node with a high load to a one with less load where the job can run faster. The process of job migration is transparent, which means that the job migrated doesn't even know that it has been migrated and acts as if it is running locally. openMosix allows a wide range of applications to be executed on it, without any special programming required.

In order to install openMosix, you have to download two files: the kernel patch and the userland tools. Debian users can use APT to install openMosix, and Gentoo users can run emerge.

One of the cons of openMosix is that it is kernel-dependent. It has a stable release for the kernel 2.4 version, but userland tools for 2.6 kernel are still under development.


Kerrighed is another SSI clustering package. Like openMosix, it is available as a kernel patch and a set of kernel modules. Kerrighed's default scheduling algorithm allows it to automatically transfer processes and threads to different nodes across the cluster in order to balance the load on the CPUs. The customizable scheduling algorithm provides seamless migration of processes that uses streams (socket, pipe, char device, etc.) without affecting communication performance. Kerrighed allows the migration of threaded application, and also the migration of an individual thread. It also offers process checkpointing, which means that processes can be paused on one cluster node and restarted on any other node. Kerrighed also supports Distributed Shared Memory (DSM), which means that each node has access to a large shared memory area in addition to its limited private memory.

One drawback of Kerrighed is that it does not allow the addition or removal of a node while the cluster is running.

Kerrighed is currently available for Linux systems using Intel processors (IA32), and requires a shared filesystem across the cluster. Kerrighed can be used with kernel versions 2.4.29, 2.4.24, and 2.2.13. A release that will work with kernel version 2.6.11 is under development.


OpenSSI, as the name suggests, is a yet another single-system image clustering product. Its homepage cites the project's goals to be "availability, scalability, and manageability, built from standard servers." OpenSSI allows the addition and removal of nodes while the cluster is running. It uses a process migration mechanism derived from Mosix to dynamically balance the cluster CPU load. OpenSSI also allows the migration of threaded processes.

One of the main feature of OpenSSI is that a single root filesystem is enforced across the cluster. When an ext2 or ext3 filesystem is mounted on any node, it will be automatically stacked and all nodes on the cluster will instantly see its mount point. There is no need to install multiple copies of Linux distributions on each node of the cluster. A distribution on the first node of the cluster is installed, and after a successful setup of OpenSSI on the first node, any additional nodes are added to the cluster by network booting the new node.

OpenSSI's process management is complete and robust. Processes are handled cluster-wide, and each process on the cluster has a single PID. Inter-process communication (IPC) is handled cluster-wide as well.

OpenSSI has been tested to work successfully with many of the open source high-performance computer (HPC) middleware products, such as MPICH, LAMPI, HP MPI, and openPBS. Many different types of servers have been tested on OpenSSI, such as LTSP (Linux Terminal Server Project), Apache (1.3, 2.0), Jakarta Tomcat 4/5, BEA WebLogic Server 9, MySQL Standard and Max with NDB Cluster, Sybase, PostgreSQL, Sendmail, Postfix, Dovecot, SpamAssassin, ClamAV, and many more -- basically anything that works with kernel 2.4 and 2.6.

OpenSSI's most notable limitation is the number of nodes supported per cluster, which is 125. OpenSSI is currently available for Fedora, Debian and Red Hat 9. Work on a version for SUSE 9.2 is in progress.


Gluster, a GNU clustering platform, is a cluster distribution aimed at commoditizing supercomputing and superstorage. It provides a platform for developing applications geared toward specific tasks such as HPC clustering, storage clustering, enterprise provisioning, and database clustering.

Gluster runs on Intel IA32 or x86-64 systems with at least 512MB of RAM. It is distribution-independent and has been tested with a wide range of distributions, including Debian, Fedora, Ubuntu, Red Hat, Slackware, and Scientific. You can download Gluster's ISO image and burn it to a bootable CD.

Gluster comes bundled with cluster applications for specific tasks, such as GlusterHPC for high performance computing and GlusterEP for system provisioning and automated platform management. GlusterFS is a cluster filesystem that can be scaled up to petabytes.

One of the cool feature of Gluster is that parts of it can be extended using Python scripts as extensions. Each extensions consists of two files. A specification file contains meta information about the extension, such as name of the maintainer, the type of extension, and whether it is an application, library or a tool. The actual contents of the extension are in a .tgz archive that is extracted during the installation of the extension. The extension itself can be written as a shell script or a binary Python script.


If you don't happen to have on hand a cluster of locally present computer systems, you can use BOINC to set up a single Linux server for your project, and volunteers from around the globe can allow you to use their system resources by joining your project. Everyone who wishes to provide resources to your project has to download and set up a BOINC client, which acts as the communicater between the server and resource provider.

Originally developed out of the SETI@home project, BOINC now runs on more than 400,000 computers worldwide, which operate at the massive aggregate speed of 613.851 TeraFLOPS, according to BOINCstats.

BOINC programs can be linked to and used on a multiple platforms such as Microsoft Windows (95 or later) and Linux running on an Intel x86-compatible processor, Mac OS running on Motorola PowerPC or Intel, and Solaris 2.7 or later running on a SPARC-compatible processor.

The flow of data between BOINC server and client is carried out through commercial Internet connections, which can be slow and thus not suitable for application that produce or consume more than a gigabyte of data per day.


Creating a high-speed cluster of computers has never been so easy. Open source software for cluster management is giving proprietary alternatives a run for life.

In addition to the above products, other open source clustering products include PVM, OSCAR, and Grid Engine. The suitability of a particular clustering software depends on the type of applications to be run on the cluster.

Click Here!