Linux makes life better: Open Source at BioIT World

21
By Daniel P. Dern

One area where high-end computer purchasing appears to be not merely strong
but growing is bio-IT, also known as bio-informatics — the use of computer power
in support of research in genetics, pharmaceuticals and other aspects of
life sciences, much of which pipelines to the discovery of new products
such as drugs and new ways to use them.

Bio-IT organizations have become users of the biggest systems in the world,
both stand-alone machines and SETI@Home-type distributed systems, by absorbing, retrieving, processig and analyzing terabytes of genetic and other data. More computing power can shave the time needed to bring new products to market and therefore, in turn, speed up bio-IT companies’ ability to generate revenue.

For companies like Compaq, EMC, HP, IBM, Sun and others, bio-IT therefore represents a major high-end market. For example, Celera Genomics’ data center includes more than 600 interconnected Alpha processors and a 70Tbyte database, and there’s a 2,768-processor six-TerOPS AlphaServer SC system at the Pittsburgh Supercomputing Center.

One indicator (or symptom) of an emerging niche’s market validity is being the
focus of a topically dedicated trade-show event, as opposed to just being an area in some bigger show. IDG, owner of the LinuxWorld Conference and Expo, just put together its first BioIT World Conference and Expo, held last week in Boston.

It’s inarguable that bio-informatics constitutes a serious hardware market.

According to the
show’s FAQ
, IDC “predicts that IT infrastructure spending in the biosciences market will grow from $9.2 billion in 2000 to more than $27 billion in 2005.

Not surprisingly, there was a lot of Linux and other Open Source software at Bio-IT World, in sessions and on the show floor. Much of the focus was, equally unsurprisingly, on clustering and other forms of distributed computing.

Linux and Open Source in session

Bill Hilf, a senior consulting IT architect at IBM, gave a half-day workshop,
“Building Linux Clusters for Bioinformatics Customers,” in which he talked about
issues such as general cluster design, application development, kernel issues,
using Open Source technologies and the Internet, and how to build reliable,
scalable Linux clusters.

“In the life sciences, Linux clusters are typically designed either
with a bounded or open, ‘un-bounded’ configuration,” Hilf notes.
“Bounded configurations are designed to run a select set of applications,
such as using BLAST (Basic Local
Alignment Search Tool)
, for genomic/proteomic similarity searching. In ‘open’
configurations, the cluster applications may vary in purpose and type, and the cluster’s
configuration can change frequently — you see these most often in universities, where many groups are sharing the cluster resources.

“Linux and Open Source are well suited for life sciences, particularly as it
relates to clustered computing, as it provides a very adaptive technological framework,”
comments Hilf. “Much like the evolution of Linux, bioinformatics is
propelled by the twists and folds of new concepts and new perspectives to problem solving. Using open standards and adaptive systems is really the only way to approach problem domains with such biological characteristics.”

The other Linux-related session was a half-day session on

Beowulf Clusters
given by Donald J. Becker, chief technical officer and acting president of Scyld Computing Corporation.

Linux, Open Source on the show floor

There was also no shortage of Linux in the close to 80 exhibitors on the show floor.
Based on my booth-knocking survey, here’s a quick overview of BioITWorld exhibitors
touting, or otherwise using, Linux or other Open Source. To be fair, most
of the desktop/client-side software seemed to be running on Windows, ditto a fair
amount of server-side wares either on Microsoft or on Oracle.

Hardware vendors touting Linux

The growing presence of Linux in mainstream hardware vendors continues to
remind me of the TCP/IP explosion of the early 1990s, which began with “huh?”
and quickly changed to “yeah, we do that to” and soon turned to somewhere between
“of course” and “we’ve standardized on it.” While I don’t necessarily expect
vendor-specific Unixen to go away, it’s getting harder to find workstation-and-higher
hardware vendors who don’t include Linux support.

Hardware vendors with Linux in their booths included Compaq, HP, IBM and Sun.
Apple was also there, with OS X stuff.

Compaq is
getting more into Linux and Open Source
, although the company is also pushing
its Tru64 Unix. Among other things, Compaq brought a blade cluster running
Scyld Beowulf.

HP was showing HP/UX, but also was showing its bladeservers running Red Hat.

And IBM — which has gone for Linux in a big way,
was showing its new Cluster 1300, with two-CPU 1Ghz Pentium III nodes, running Red hat 7.1. The booth had an eight-node machine; IBM supports up to 128 two-CPU nodes.

Only one of these booths was, I believe, handing out penguins — the hand-sized, squeezable plastic foam kind.

Storage is, unsurprisingly, also big in bio-IT, where terabytes-sized data is
slung around casually. Storage vendors present who supported access for Linux systems
included Auspex, EMC, and Quantum’s Networked Attached Storage division.

Clustering and other distributing/load stuff

Clustering, and other distributed/parallelizing systems and tools, also
had a strong presence on the floor — again, no surprise, between the size of
some of the problems, and, for many companies, the financial impetus to get answers
as fast as possible.

  • United Devices, whose CEO created SETI@Home, was there, showing its MetaProcessor cycle-sharing technology, which runs on both Linux and Windows clients (the back end is Linux), highlighting uses in
    life sciences.
    UD’s Software Development Kit allows application developers to
    take any algorithm or application that’s parallelizable, and distribute it.
    UD has several bio-IT efforts; MetaProcessor is also being
    used in Internet-based distributed bio-informatics projects, including cancer research.

  • Platform.com provides clustering management and other distributed computing software, including ClusterWare for Linux. Its customers in the bio-IT space include
    Abbott Labs, Biogen, Celera, DuPont, Eli Lilley, Mayo Clinic, and Novartis.
    A booth rep said, “We’re hardware agnostic; our product makes it easy to install and
    administer and get clusters running.”

  • Microway Inc. offers Athlon and Intel-based Beowulf clusters running Red Hat; its average cluster size is about 64 nodes, some go up to 250 to 300 nodes (500 to 600 CPUs).
    You can find Microway clusters at places like LION Bioscience, Millennium Pharmaceuticals, and lots of universities. Microway started with Digital Alphas, running Tru64 Unix; according to the person I spoke to at the booth, “Linux was obviously a very nice port from there.”

  • Avaki provides the ability to “federate” computers” securely, using its proprietary grid technology.
    Avaki works with Red Hat Linux, as well as
    AIX, Compaq/Tru64, IRIX, Solaris and Microsoft Windows 2000 and NT,
    on desktop systems through enterprise-class servers. Bio-IT users include
    Gene Logic, Inc., Infinity Pharmaceuticals, Structural Bioinformatics, Inc., and also
    the Scripps Research Institute, which used Avaki to combine resources from
    five supercomputing centers on the Internet.

    Avaki is also in the process of submitting its reference implementation of

    Secure Grid Naming Protocol
    to SourceForge.net.

    Apps on tap

    Application exhibitors were, unsurprisingly, largely in the category of “improve speed of
    finding answers, which leads to faster time-to-market for new drugs.”
    This included everything from “faster/better searching of internal and external
    databases” through “business process pipelining.”

    Although I’d informally estimate that the majority of the application vendors
    at the show ran on Microsoft OSes, there were some Linux/Unix ones as well:

  • LeadScope Inc. is focused on the lead discovery process in the
    pharmaceutical arena, doing chemically intelligent analysis of
    high throughput screening runs. Leadscope runs on Linux and Sun Solaris.
    The person I spoke to at the booth says that Solaris is the industry standard, but
    Linux is more cost-effective. “We see comparable or better throughput on
    [Red Hat] Linux versus Solaris, at a third of the cost for the whole machine
    — $15,000 versus $5,000.”

  • LION Bioscience AG, a data integration company offering analysis tools, along with skills and expertise to integrate internal and public data sources, provides software that runs on Unix, Linux and Microsoft OSes.

  • PathWay Prism from Physiome Sciences, mostly written in Java, runs on Linux, etc., and does pathway modeling and simulation for biological processes like drug discovery.

  • Vertical*i is a software company focused on business development
    for life sciences. Its LEA platform, which runs on Linux and other OSes,
    helps “manage the sales pipeline, work flow, content management,” etc.
    LEA is all Web-based, for any browser (that’s the booth rep’s words — I didn’t query
    whether “any” included Opera or Konqueror).

    Systems integrators that worked with Linux and had booths included:
    Networked Information Systems, Viaken Systems Inc. and
    Allez Software. Oddly enough, this category’s exhibitors have the most time-wasting splashy home pages.

    Open Source organizations in the bio-IT space

    The Bio-IT field has its own Open Source group (and, for all I know more than just one):
    Bioinformatics.org, founded in 1998, as
    (according to the Web site), “a non-profit, academe-based organization
    committed to opening access to bioinformatics research projects, providing
    Open Source software for bioinformatics by hosting its development, and keeping
    biological information freely available.”

    According to director Jeffrey Bizzaro, who was in the booth, Bioinformatics.org
    was started “to counter a lot of the privatization in the field.” The group currently
    has more than 2,500 members, from countries including Brazil, China, Germany, India, Japan, and the United Kingdom as well as Canada and the United States, and hosts over
    53 bioinformatics projects and Web sites, and according to Bizzaro,
    “at least half or more actually work.”

    Bioinformatics.org “uses the Open Source Initiative’s definition of “Open Source,”
    and will host projects under any OSI licenses including GPL, BSD, and MIT.
    The project management system used by the site is based on
    SourceForge.net.

    Daniel P. Dern is a freelance technology writer. His Web site is www.dern.com.

  • Category:

    • Linux