March 2, 2006

Planetary lab's research orbits around Scyld cluster

Author: Tina Gasperson

A Scyld Beowulf cluster from Penguin Computing is helping researchers at the University of Arizona Lunar and Planetary Lab (LPL) boost computing resources and produce research results more quickly.

The lab is home to almost a dozen separate research groups, each with five to 12 researchers, each with his or her own computing needs. The groups conduct research on various topics such as space physics, planetary occultations, and spacecraft missions. Sometimes LPL researchers study material that has gone into space or been brought back from space by NASA, trying to be the first to publish findings in scientific journals.

All this activity takes a lot of computing power, but for years the lab's computing capacity was stunted by a mishmash of disparate workstations and servers and ad hoc networks that had to be put together and taken apart over and over again.

Last May, former UofA systems analyst Josh Bernstein began looking at alternatives to the "nightmare" of supporting many different research groups. "I couldn't easily control all the individual machines," Bernstein says. "There were a lot of scripts I wrote to try to manage it. It was very decentralized."

Part of the problem was that when new research projects were beginning, the professors and graduate students had to ask for more computers in order to perform the complex calculations necessary for getting the correct results. Researchers would move machines from room to room, install new software, and change configurations, leaving Bernstein responsible for putting everything back the way it was when the research was over.

To bring order to this chaos, Bernstein began to push for a cluster solution that would provide the flexibility and control he needed as an IT admin, but still allow for the complexity that happens when many different research projects are going on.

Bernstein weighed the options, including systems from Sun, Hewlett-Packard, and Microway. "They were all trying to sell us an open source solution," he says. "And a lot of them said, 'We'll run whatever you want,' which seems like a good thing until you ask yourself what you really want."

Penguin Computing's cluster seemed to have the answer to Bernstein's question. "Part of it was standardizing on an operating system. All the different researchers had been running different things -- some were on Solaris, some people were set on Red Hat, and some wanted Debian," he says.

The Penguin Beowulf cluster consists of 46 dual-CPU Opteron nodes and runs Scyld, Penguin's in-house Linux operating system specifically designed for Beowulf clusters. Users log in to the cluster to perform CPU-intensive calculations and processes, but still keep their individual desktops.

The biggest benefit for Bernstein was gaining the ability to manage the entire cluster through one interface. "Because the cluster is essentially one machine, you can put all possible configuration combinations on it, and the users can pick and choose their tools," he says. "I don't mind the complexity of one machine. The complexity of 40 to 100 machines was overwhelming."

Now that the new cluster is running, researchers at the Planetary Lab are able to do more complex calculations, five to 10 times faster than before. "We gave them more resources in an on-demand fashion, and they were able to get more done," Bernstein says.

"The bottom line is capability. The professors can go out for better grants. They can do science that has been unheard of. It allows scientists to go out and be more aggressive with their proposals, and the more grants they bring in the more money the lab has. The key really is, you only get one shot. The data's got to be processed and valid. You want to be the first one to process it and publish it -- if you're the second one, you kind of miss out."

Click Here!