November 16, 2005

Condor: Building a Linux cluster on a budget

Author: Bruno Gonçalves

So you need a lot of computing power but don't want to spend tens of thousands of dollars on a commercial cluster? Or maybe you just have a lot of machines sitting idle that you would like to put to good use? You can build a powerful and scalable Linux cluster using only free software and off-the-shelf components. Here's how.

To build our cluster we are going to use three pieces of software:

Fedora Core 3 will be our base distribution, since it's freely available and well supported by both DRBL and Condor. For the purposes of this tutorial we'll assume that you already have FC3 installed on the machine that is to become your server. The server machine will be responsible for storing and serving all the files necessary for the client machines to be able to work. It will also act as a firewall between the cluster and the outside world.

DRBL on its own can create a set of machines that could be used as thin clients in, say, a classroom setting. By adding the Condor clustering software we turn this set of machines into a computing cluster that can perform high-throughput scientific computation on a large scale. You can submit serial or parallel computing jobs on the server, and Condor takes care of distributing the jobs to idle cluster machines, if any, or putting them in a queue until the required resources are available. Condor can also perform periodic checkpoints on the jobs and restart them if something causes the machine on which they are running to reboot.

The end product is very stable. You can add and remove compute nodes from the system without having to reconfigure anything. Designating a machine as a member of the cluster requires just a small penalty of hard drive space on the server to store some configuration files; in my cluster, config files for 72 machines use less than 3GB of space on the server.

DRBL installation

In this example we will assume that our server has three network interfaces: eth0, to connect to the outside world, and eth1 and eth2 to connect to the client machines. We're using two different subnets for the client machines in order to spread the network traffic over two different routers. Our client machines are cheap boxes with nothing but a motherboard, CPU, RAM, and network card.

Before you start installing DRBL you should configure eth1 and eth2 with private IP addresses (such as 192.168.x.x) and set the clients' BIOSes to boot via the network card.

DRBL is available as a small RPM package. To set it up we are going to be using two installation scripts: drblsrv, to configure the server, and drblpush, to configure the clients. Start with drblsrv by running:

/opt/drbl/setup/drblsrv -i

This Perl script installs the packages that DRBL requires, such as DHCP and TFTP, if they're not already installed, and asks if you want to update the system. If you choose to perform the update, it will run YUM to update all the packages for which there is a newer version (similar to what would happen if you were to run up2date manually). The default options work well for most cases. You just have to remember that if you downloaded the DRBL RPM from the testing or unstable directories to select Yes when it asks if you want to use the testing or unstable packages.

After drblsrv finishes (which can take a fair amount of time, especially if you have a slow Internet connection and you asked to update the system) it's time to run drblpush:

/opt/drbl/setup/drblpush -i

This second Perl script is the workhorse of the system. It configures and starts all the services required to make the cluster work. It automatically detects the network interfaces that have private IP addresses assigned to them and asks you how many clients you want to set up in each of them. For added security, you can bind the booting process (via DHCP) to the MAC addresses of your clients. This feature is useful if you are setting up your system in the open (in a classroom, for instance) where anyone can plug in a new machine without you knowing about it. We'll assume our system is behind closed doors and accessible to users only by SSHing to the server, so we'll choose N for this option and simply tell DRBL how many clients we want on each network interface. They all get IP addresses on a first-come, first-served basis at boot time.

At this stage of the configuration process DRBL will present you with a little graphical view of your setup:

          NIC    NIC IP                    Clients
+-----------------------------+
|         DRBL SERVER         |
|                             |
|    +-- [eth0] xxx.xxx.xxx.xxx +- to WAN
|                             |
|    +-- [eth1] 192.168.110.254 +- to clients group 1 [ 24 clients, their IP
|                             |            from 192.168.110.1 - 192.168.110.24]
|    +-- [eth2] 192.168.120.254 +- to clients group 2 [ 24 clients, their IP
|                             |            from 192.168.120.1 - 192.168.120.24]
+-----------------------------+

In this example we made room for 24 machines on each subnet. If your configuration doesn't look the way you intend it, you can press Ctrl-C and restart from scratch; nothing has been changed in your system yet. After this point, the only option you might want to change from the default is the one referring to the boot mode. Since these machines will be used only for number-crunching, you can set the boot mode to "2" for text mode so you don't waste resources starting an X server that will never be used. You can safely leave all the other options at default values, or modify them to serve your specific goals. If at a later date you change something on the server and you want those changes to propagate to the clients you need to rerun drblpush.

In this case you can choose to keep the old client settings and just export the changes or to redo everything from scratch. After DRBL finishes successfully, you can boot your client machines and if everything went according to plan you should now have a basic cluster system up and running. If you run in to any problems, you can use the project's support forum. My experience has been that you typically get an answer back from one of the developers within about a day.

Condor installation

To make the most of our new Linux cluster we will install Condor, a batch system developed by the University of Wisconsin. Download the latest tarball, uncompress it, and run # ./condor_install

The installation script provides plenty of details on what each option means, so you shouldn't have many problems setting it up. You must, however, keep in mind that everything is shared between the server and the clients and answer accordingly when it asks you about the file systems, accounts, and password files. Also, if you want to be able to make use of Java support you need to have Sun's Java virtual machine installed prior to installing Condor. (If you install it afterwards you can sort through Condor's configuration files or redo the Condor installation.) Message Passing Interface support requires an mpich installation (LAM's implementation does not play nicely with Condor) and possibly tinkering with the configuration files after the installation. For further details on Condor installation and its many features, refer to the Condor manual online and to the very active mailing list that you are invited to subscribe to when you download the software.

After the installation, start Condor by running:

# /usr/local/condor/bin/condor_master

This command should spawn all other processes that Condor requires. On the server you should be able to see five condor_ processes running:

# ps -ef | grep condor
condor   16418     1  0 14:18 ?        00:00:00 condor_master -f
condor   16477 16418  0 14:20 ?        00:00:00 condor_collector -f
condor   16478 16418  0 14:20 ?        00:00:00 condor_negotiator -f
condor   16479 16418  0 14:20 ?        00:00:00 condor_schedd -f
condor   16480 16418  0 14:20 ?        00:00:00 condor_startd -f
root     16503 16382  0 14:20 pts/4    00:00:00 grep condor
#

When you start Condor on the clients you should see all of the above except condor_collector and condor_negotiator. Sometimes the installation program gets a bit confused with the fact that the server has several hostnames (the usual hostname for eth0, nfsserver_eth1, and nfsserver_eth2 respectively); in those cases, running condor_master on the server will start only the services that are used by a client machine. You can easily fix the problem by using condor_configure to force the current machine (with all its hostnames) to be the server. This will change the relevant configuration files, after which you must tell Condor to reread them:

# ./condor_configure --type=submit,execute,manager
# /usr/local/condor/sbin/condor reconfig
Sent "Reconfig" command to local master
#

After that you should be able to see the server machine as part of your Condor cluster when you run condor_status.

$ condor_status

Name          OpSys       Arch   State      Activity   LoadAv Mem   ActvtyTime

underdark     LINUX       INTEL  Unclaimed  Idle       0.240  1518  0+00:20:04

                     Machines Owner Claimed Unclaimed Matched Preempting

         INTEL/LINUX        1     0       0         1       0          0

               Total        1     0       0         1       0          0
$

If you now run condor_master you should see the clients being added to this list within a few minutes (usually around five to 10 minutes, but the more machines you have the longer it takes for all of them to finish handshaking with the server).

There are a number of tutorials available online (not to mention the official manual) that teach you the basics of using Condor, but the mechanics of it are fairly simple, especially if you've used other batching systems before.

Getting work done

To submit a job to the cluster, you need to create a submit script -- a text file that should look something like this:

Executable = hello
Universe = Vanilla
Output = hello.out.$(PROCESS)
Input = hello.in.$(PROCESS)
Error = hello.err.$(PROCESS)
Log = hello.log
Queue 3

In this example our executable file is called hello (the traditional "Hello World" program). We're using the Vanilla universe (just your normal run-of-the-mill executable; check the manual to find out about other universes). The Input, Output, Error, and Log directives tell Condor which files to use for stdin, stdout, and stderr, and to log the job's execution. Finally, the Queue directive specifies how many copies of the program to run. The $(PROCESS) macro allows each job to use a different set of files, so job 0 would use hello.*.0, job 1 uses hello.*.1, etc.

After you have the submit file ready you can submit it to Condor by running condor_submit hello.sub. You can check on the status of your job using condor_q, which will tell you how many jobs are on the queue, their IDs, and whether they're running or idle, along with some statistics.

Condor has many other features; what I have shown you here is just the basics of how to get it up and running. When reading the manual, you should pay particular attention to the Standard universe, which allows you to checkpoint your job (at the cost of specially linking it) and the Java Universe, which allows you to run Java jobs seamlessly.

You can also add Condor to the boot sequence of your server and clients (follow the instructions in the manual and rerun drblpush). By doing this and setting the DHCP server to allow more machines than you currently have, you can easily add machines to the cluster by simply connecting them to one of the subnets and booting them. They will start Condor on their own, connect to the central manager, and become available to run jobs in just a few minutes.

You can shut down cluster machines and their jobs will continue (in the Standard universe) or restart on a different machine (in the Vanilla universe). This allows for a lot of flexibility in managing the system. Also, by starting and stopping the DHCP server at a given time you can change the booting process for the clients. This is useful if you have a computer lab full of machines that need to run Windows during the day but that are free to be added to the cluster at night. Just modify the boot device sequence of the clients BIOS to first boot from the network and only then boot from the hard drive. If the DHCP server is on they will boot from it and join the cluster; if not (in the morning for instance), the attempt to boot from the network will fail and they will simply boot from the hard drive.This will allow you to put all those Windows machines to good use without having to change their settings (other than the BIOS and possibly a Scheduler job to have them boot out of Windows at night).

Click Here!