Biotechnology firms look to Linux, and often IBM, for computing power

31

Author: JT Smith

By Jack Bryar

Biotechnology companies live and die on the basis of the intellectual
property they can generate for their investors. Such companies don’t sound like
natural allies of the Open Source movement, especially because so much of their
product is proprietary software and closed databases. Yet, ironically, the
biotechnology community is becoming one of the biggest consumers of Linux-based
systems. Will this relationship last?
Recently, there has been a flood of announcements as dozens of biotech
firms have announced they were converting their most important systems to
Linux platforms. The three most important reasons for this are: Cost,
Performance, and IBM.

Most commercial Open Source pioneers wrote their first business cases
assuming they could afford to give away Linux software as long as they sold
consulting services. Since then, several have decided they could give away some
code as long as they exempt other code from the GNU GPL or other open licenses. It
has been apparent for some time that IBM has had the best success executing
either of these models. The company’s recent success in the
biotech market shows how it has marketed a combination of subject-matter
expertise in the clients’ business, a great Linux-based system,
and lots proprietary add-ons to win deal after deal — and make some serious
money doing it.

One of the most important developments in the company’s sales strategy has been
IBM’s success in marketing and developing massive Linux clusters capable of
solving computing problems featuring millions of variables. Part of the
attractiveness of Linux cluster technology is that a bunch of inexpensive machines can
be connected to do heavy-duty number crunching. This is just the sort of
affordable capability that the biotech community has needed for some time. While
many firms can set up Linux clusters, only IBM has supported a dedicated life sciences unit and offered free software and application-specific support targeting
the biotech community.

Recently, IBM Life Sciences developed a program with Seattle’s Institute for Systems Biology.
IBM will provide the institute with its underlying IT systems and
collaborate on tools to model the interaction among various proteins.
The institute’s staffers pointed to IBM’s ability to set up and support Open Source
architecture and its clustering expertise, as well as its expertise in
bioinformatics, as reasons for partnering with the company.

One reason for the institute’s enthusiasm is money, or rather the lack of it. As
a non-profit, the company has found itself scrambling for cash despite a
staff full of high-profile biology superstars such as institute president Leroy Hood, once a leader in the Human Genome Project.

Hood is one of a growing body of scientists who believe the only
way to understand and develop designer drugs for such ailments as
Parkinson’s Disease, Lou Gehrig’s disease and a host of others is to apply massive
computing resources to analyze the complex web of relationships between proteins
and human genes. Unfortunately, Hood’s vision has been greater than his
financial strength. Hood’s discipline, called proteonomics, has not attracted many funders.
The complexities of protein analysis and mapping protein/genetic interactivity cannot be underestimated. Most venture capitalists and foundations assume no one has an obvious best methodology and have been sitting on the sidelines. The institute has only raised a fraction of the money it originally counted on. It has had
to figure out how to generate the most computational firepower with the
least amount of cash. That has made low-cost code and inexpensive clustering
technology central to its IT plan.

Ironically, Hood came to the Seattle area
following a recruiting drive underwritten by Bill Gates’ $12 million dollar gift
to the University of Washington’s medical school.

In recent weeks, IBM has announced a series of other deals with
companies and institutions active in the fields of genomics and proteonomics. One
of these is the Molecular Mining
Company
, a biotech services firm specializing in genetic and protein analysis
though data mining techniques. IBM will provide MMC with a Linux-based
platform, software tools and specialty programs to support MMC’s genetics package.

Another is the German biotech company with the odd name of 4SC AG. 4SC has developed its own “cheminformatics” approach to study genetically
influenced diseases such as arthritis, as well as a variety of
infectious diseases. It has developed a screening and analysis system to predict
the biological activities of proteins based on their physical structure as
well as their chemical composition. The company claims it can analyze cross
matches, and predict the biological activity of index millions of separate
proteins, in less than 24 hours.

Fast as this is, it is not fast enough. Finding a financially viable
architecture with the computing power needed to execute the company’s
“Virtual High Throughput Screening Technology” required a new approach.

Like the Institute for Systems Biology and MMC, the firm settled on IBM Life Sciences as the only vendor that combined expertise, specialty software and IT
skills. And Linux. At 4SC, IBM will install 256 double-processor eServers
configured in a Linux cluster to run simulations of the interactions of potential
drug targets, although the configuration features plenty of proprietary IBM
software mixed in along with the Linux platform and tools.

Yet another biotech company, Gene
Network Sciences
is using Linux clusters to create interactive computer models of
the activity of living cells. Earlier this month, GNS generated what it
called a “predictive simulation” of a cancer cell, including many of the
relevant genes and unique proteins found in a human colorectal cancer cell.

Developers say that the computer model will help researchers identify
high-value targets for drug interaction.

Presently GNS’s model has roughly 500 gene and protein markers. It
hopes to increase that to 5,000 markers within a year. Workers there hope their efforts
will lead to designer therapies that will kill cancer cells while leaving
normal cells untouched.

GNS has created a number of proprietary tools to simplify the process of
biological modeling. But these tools run on Open Source architecture. The company
has built a 192-processor Linux supercomputing cluster with (surprise) IBM.

While IBM has jumped to a dominant position in this market, companies
like Linux NetworX
are developing their own no-nonsense solutions targeting the
biotech market. They have won several important deals. Among the most important
was a sale to
Boehringer Ingelheim Pharmaceuticals
.

Boehringer is no start-up. It is one of the world’s largest developers
of pharmaceutical and medical equipment, with more than 28,000 employees.
However, it, too, is relying on Linux-based clusters to provide the computing
muscle it needs to run performance simulations of thousands of potential
pharmaceutical compounds. At the core of its lab is a 120-processor Linux NetworX
Evolocity cluster supercomputer. The company’s “computational chemists” are
focused on the interaction of various molecules and the “protein binding sites”
cells and pathogens, to speed the process of identifying potential drugs and
determining precisely how they achieve their effect.

Linux NetworX has also won a deal with Sequenom, a developer and publisher of genetic sequencing data.

To be sure, other firms are beginning to develop solutions mixing
proprietary and open code, promising even better performance or lower prices. Sun
has been particularly active in biotech. It has used this community to
evangelize the benefits of an alternative to clustering, using processing cycles
from all the machines in a workgroup or enterprise.

Sun’s Grid Engine
Initiative has won a number of biotech customers. Both
Cognigen
and Plexxikon
are using Sun’s Grid to run the complex calculations needed to develop
designer drugs capable of interacting with specific genes at the molecular
level. These initiatives are based on Solaris and what Sun calls a
high-end version of the Grid Engine. However, the company admits that at least
25% of the biotech companies and research institutions using Grid are
running the Open Source
version on Linux or UNIX. Sun has signed a deal with SuSE to
distribute the Linux version.

Dell has also begun to aggressively promote Linux clusters built on its
hardware. In recent days, the company has announced new relationships with Red Hat
and Oracle. According to Dell vice president Russ Holt, clusters fit into
Dell’s business plan as an attractive high-end alternative to mainframes, and
a way to gain a toehold in markets including economic forecasting,
atmospheric research, and oil and gas exploration, as well as biotech.

To be sure, the broader Open Source movement has had some negative
effects on some biotech firms and on the broader market. The market for
“bioinformatics” products such as gene and protein databases and tools for analyzing them has shrunk dramatically according to some analysts, who once projected this
market would grow into a $30 billion industry. Universities and independent researchers have generated so many Open Source databases and tools that many analysts are radically revising their estimates. They point to the impact of open databases such as the University of California’s GENOME project, and the GenBank run by the U.S. National
Institutes of Health, and the explosion of Open Source data mining and protein analysis tools being developed by the academic community.

So, while these proprietary biotech firms may be benefiting from Open Source today, they face the long-term impact of the broader Open Source movement cutting into their businesses.

Category:

  • Linux