UNC makes profitable investment in open source for drug research


Author: Tina Gasperson

The United States spends about $30 billion annually on pharmaceutical research and development, and Europe spends about $20 billion, according to the Pharmaceutical Research and Manufacturer’s Association. For at least one major university, Linux has played a big part in making that research more efficient and cost-effective.

Computer-aided drug discovery (CADD) is the process of identifying and testing molecules to determine their effectiveness in treating symptoms and curing diseases. It’s the precursor to production of patented pharmaceuticals. CADD is huge business and hugely competitive, especially since advances in clustering and grid computing have drastically reduced the time it takes to go from hypothesis to conclusion. The widespread use of Linux in multi-CPU operations has greatly reduced the expense.

The University of North Carolina’s Laboratory for Molecular Modeling uses chemical informatics in its CADD work, a process that employs statistical data analysis to screen compounds. The lab has been doing its work under Alex Tropsha, associate professor and director of the lab, since 1991. Over time the processes used at the lab became too tedious for researchers to work efficiently, even though the most powerful software was in use.

“The entire process of model building was laborious and manual,” says Tropsha. The individual modules, though technologically advanced, were not integrated, which meant that unless researchers knew how to execute each one, they had to have help from technical staff in order to run the exhaustive molecular testing that it takes to discover the tiny percentage of hypothetical drugs that will actually perform as researchers hope they will.

Tropsha says the lab already owned a cluster, but it was “totally useless” because researchers couldn’t access its parallel-computing power. The lab needed to find a way to make the process more user-friendly, or risk losing its position as one of the top CADD programs in the United States.

UNC and IBM got together to create an improved CADD process built on IBM’s WebSphere on Linux, relying on a workflow structure to automate most of the tasks involved in running test processes. Since the staff at the Molecular Modeling Lab had never created a workflow for the complex array of processes, that task presented a challenge. “It took time, because our flow integrates different kinds of software and small steps,” says Tropsha.

The team began the transition to WebSphere in November 2002. Since WebSphere is geared more toward database queries than statistical analysis, IBM had to design and code a number of workflow loops, encased in Java “wrappers.” After about a year and a half of workflow creation, design, and coding, the lab ended up with a complete customized version of its model building software in May 2004.

The new software has significantly enhanced researchers’ ability to perform complex analysis. A process that was strictly one user, one computer, and one program at a time, has now evolved into a tool that allows one non-technical user to run testing on thousands of models with one click.

“Even in April we had software we were able to show at the American Chemical Society meeting,” Tropsha says. “For a relatively small investment we were able to translate our homegrown scientific tools and create professional software.”

Ironically, Tropsha, UNC, and IBM may be set to make a bunch of money from that small investment. Although the university provides its drug discovery software (binaries only) free of charge to academic users, it has started a new for-profit company called Phorcast. “We have a technology that we couldn’t find anywhere else,” Tropsha says. “We’re transferring that knowledge to [Phorcast], and the company will continue development and hopefully grow.”


  • Open Source