April 10, 2006

Building a Linux supercomputer using SSH and PVM

If you have a couple of old Linux boxes sitting around, then you've got the makings of a supercomputer. Dust them off, install Secure Shell (SSH) and Parallel Virtual Machine (PVM), and start your complex algorithms.

All right, it's not quite as simple as that. PVM handles only the messaging between the machines. You must write your own programs to actually do anything.

First, network your PCs and set up NFS on each. I'm not going to go into detail because most Linux distributions take care of everything for you. With Debian, for example, simply connect a cable between your new PC and your network switch, stick in your installation CD, switch the PC on, and follow the prompts. If you need more information, take a look at the Linux.com how-tos on networking and NFS.

Now you can start setting up your PCs as a single supercomputer. In order for them to work as one, you need a single home directory -- hence, the need for NFS. Choose the machine that hosts the home directory and edit /etc/exports. If the file isn't there, then you must set up the PC as an NFS server -- check your distro's documentation. If you're using Debian, simply type sudo apt-get install nfs-kernel-server.

Now add in the details for each of the hosts where you want the common home directory. In this example, I'm exporting my home directory from polydamas (my NFS server) to three hosts: acamas, cassandra, and hector:

/home acamas(rw)
/home cassandra(rw)
/home hector(rw)

You can see the full list of possible options when exporting by typing man exports on the command line. Don't forget to add all hosts into your /etc/hosts file too.

Now either reboot your NFS server or check your distro's documentation for the relevant command that lets your hosts see the exports. On Debian, the command is exportfs –a.

You can now turn to your NFS client hosts and set them up so that they use the home directory that you're exporting from the NFS server. If you feel that exporting the whole /home is overkill, simply export the home directory for the user that you want to be able to run the supercomputer.

If you're confident that everything is going to work, then just move the current /home somewhere safe (don't forget to rename it /home_old). Run mkdir /home, then edit your /etc/fstab file so that it contains the details for the NFS server:

polydamas:/home /home nfs rw,sync 0 0

Make sure that your /etc/hosts file contains the IP address for your server, then either reboot or reload the NFS data:

sudo /etc/init.d/mountnfs.sh

If you're not quite that brave, mount the directories manually before you commit to automating the process fully.

Set up SSH

Now that you have a common /home, you need SSH. Chances are, your Linux distribution came bundled with SSH. Each of my machines uses Debian, which loads OpenSSH automatically.

Set up SSH so that you don't have to enter a password each time you use it. For more information, take a look at Joe Barr's "CLI Magic: OpenSSH" and Joe 'Zonker' Brockmeier's "CLI Magic: More on SSH."

You'll find yourself benefiting from a common /home directory. Instead of having to set up an authorized_keys2 file on each machine, you only have to do it once on the NFS server:

ssh-keygen -t dsa
cat .ssh/id_dsa.pub > .ssh/authorized_keys2

If you just want to be able to run processes in parallel, then you're ready to go.

Looking for more? You might want to create programs that use the resources of all of your machines. Let's say you have three Linux boxes connected to your network, and you have three Linux scripts sitting in your home directory that you need to process. Simply run each one via SSH:

#Run the files on the machines
ssh bainm@acamas ./batch_file1 &
ssh bainm@cassandra ./batch_file2 &
ssh bainm@hector ./batch_file3 &

You can distribute work around your network easily using this technique. Although useful, the scripts don't provide any feedback. You must check each machine manually for the progress of each file before you continue with your computations. However, you can add feedback by making each of the distributed files write its results back to a common file on your home directory.

In this next example, you can calculate pi to any number of decimal places:

#File name: calc_pi
RESULT_FILE=$1
DECIMAL_PLACES=$2
RESULT=$(echo "scale=$DECIMAL_PLACES;4*(4*a(1/5)-a(1/239))"|bc -l)
echo "$(uname -n) Pi: $RESULT" >> $RESULT_FILE

I calculated pi = 4 x ( 4arctan(1/5) - arctan(1/239) because that's what I was taught in college; there are other ways.

Now tell each of your machines to run a process:

ssh bainm@acamas . ./calc_pi pi_results 10 &
ssh bainm@cassandra . ./calc_pi pi_results 20 &
ssh bainm@hector . ./calc_pi pi_results 30 &

After a couple of seconds, a new file (pi_results) contains this code:

acamas Pi: 3.1415926532
cassandra Pi: 3.14159265358979323848
hector Pi: 3.141592653589793238462643383272

Let PVM do the work for you

While this is useful to know, you're probably better off using software that does all the work for you. If you're happy using C, C++, or Fortran, then PVM may be for you. Download it from the PVM Web site, or check if you can load it using your distro's methods. For instance, use this command on Debian:

sudo apt-get install pvm

Install PVM on all of the machines, then log on to the computer you want to use as your central host. If it's not your NFS server, remember to generate a key for it and add it to the .ssh/authorized_keys2 file. Once you start PVM by typing pvm on the command line, you can start adding hosts. Don't worry about starting PVM on the other machines -- that's done automatically when you add a host.

$ pvm
pvm> add acamas
add acamas
1 successful
                    HOST     DTID
                  acamas    80000
pvm>

If that seems a bit long-winded, then list your hosts in a file and get PVM to read it:

$ pvm hostfile

Type conf to check which hosts are loaded:

pvm> conf
conf
4 hosts, 1 data format
                    HOST     DTID     ARCH   SPEED       DSIG
               cassandra    40000    LINUX    1000 0x00408841
                  acamas    80000    LINUX    1000 0x00408841
                  hector    c0000    LINUX    1000 0x00408841
               polydamas   100000    LINUX    1000 0x00408841
pvm>

Type quit to exit PVM and leave it running in the background. Type halt to shut down PVM.

Now you can create a program that uses PVM. You need the PVM source code. As always, check the details for your distro -- usually, you can get the files easily. For example, Debian uses this command:

sudo apt-get install pvm-dev

You need the files on only one of your machines; thanks to the common home directory, you can use any of them. Create a directory called ~/pvm3/examples and look for a file called examples.tar.gz -- you'll probably find it in /usr/share/doc/pvm. Unpack this into the directory you just created. You'll see a set of self-explanatory files that show you exactly how to program with PVM. Start with master1.c and its associated file slave1.c. Examine the source code to see exactly how the process operates. Use this code to see it in action type:

aimk master1 slave1

aimk -- the program for compiling your PVM programs -- creates your executables and places them in ~/pvm3/bin/LINUX. Simply change to this directory and type master1. Assuming you're on the machine where you're running PVM, you should see something like this:

$ master1
Spawning 12 worker tasks ... SUCCESSFUL
I got 1300.000000 from 7; (expecting 1300.000000)
I got 1500.000000 from 8; (expecting 1500.000000)
I got 100.000000 from 1; (expecting 100.000000)
I got 700.000000 from 4; (expecting 700.000000)
I got 1100.000000 from 0; (expecting 1100.000000)
I got 1700.000000 from 9; (expecting 1700.000000)
I got 1900.000000 from 10; (expecting 1900.000000)
I got 2100.000000 from 11; (expecting 2100.000000)
I got 1100.000000 from 6; (expecting 1100.000000)
I got 900.000000 from 5; (expecting 900.000000)
I got 300.000000 from 2; (expecting 300.000000)
I got 500.000000 from 3; (expecting 500.000000)

If you're a Fortran programmer, don't worry -- there are some examples for you as well. Other languages don't offer examples, but look on the PVM Web site for support for numerous languages, including Perl, Python, and Java. You'll also find various applications to help with PVM, such as XPVM for a graphical interface.

Click Here!