September 10, 2008

NEBC Bio-Linux distro falls short

Author: W. Dean Freeman

As the fields of computational biology and bioinformatics become more important, not only to the economy, but to our understanding of the natural world and ourselves, Linux is becoming a better platform on which to build and deploy the software scientists will rely on. A few groups have even gone so far as to create entire distributions geared for computational biology, such as BioBrew and Debian-Med. One of the more prominent comes from Oxford's National Environmental Research Council's (NERC) Environmental Bioinformatics Centre (NEBC). Bio-Linux does not sell itself as your average distribution, but it does not measure up to an average distribution either.

NEBC Bio-Linux comes in two forms -- a Knoppix-based live DVD, and a net-install from NEBC's servers. To get the latter, you must fill out an application form. If you're worthy, NEBC will send you an installation package that comprises a CD, a diskette, and installation instructions. The CD provides the boot loader and customized scripts that partition the hard disk and copy the Bio-Linux snapshot image from NEBC's servers to the target computer. The diskette contains the information necessary for the installation scripts to find and access the server. Actual installation is expected to be conducted via the Internet, at a scheduled time negotiated with NEBC's help desk.

I downloaded and booted the live DVD image, and found an impressive amount of bioinformatics software available under a menu cleverly marked with a stylized DNA molecule. For instance, Jemboss is a Java-based front end to the EMBOSS (European Molecular Biology Open Software Suite) set of programs, which provide a comprehensive set of tools for sequence analysis and more. Taverna is a workflow manager that is compatible with various suites of bioinformatics tools. It allows scientists to piece together the processes which they want to pass their data through. Handlebar is a Web-based system for keeping track of bar codes for samples and inventory around the lab, written in Perl with a PostgreSQL back end. With more than 40 bioinformatics-related applications -- everything from rasmol to MrBayes -- and the ability to obtain more from NEBC's repositories, the scientific software selection does not disappoint.

However, for a system which touts itself as being geared toward "wet bench" scientists who may or may not have much Linux experience, all is not bread and roses, though most of the shortcomings in Bio-Linux are inherited from Knoppix rather than being anything the NEBC introduced itself.

Worse, though -- unlike most modern Linux live CD distributions, the Knoppix base provides no easily accessible hard drive installation option. Doing a little research, I was found the hidden invocation of sudo knoppix-installer from the shell. However, the Knoppix installation program is perhaps one of the most difficult I've encountered. Successfully partitioning the disk alone, for which knoppix-installer relies on QtParted, was a feat that makes the somewhat archaic methods of NetBSD seem like a walk in the park. When I finally figured that out, installed the system and rebooted, I was immediately greeted with a kernel panic and failure to boot. Multiple attempts all met the same fate, and I was unable to produce a working install from the live DVD.

In addition, the distro's development tools come up short. Bio-Linux Live provides the Eclipse integrated development environment, listed with the other bioinformatics software on the system. However, it does not include the EPIC Perl extensions for Eclipse. As Perl is one of the most common languages used for bioinformatics development, due to its native text parsing ability and the BioPerl modules, not providing EPIC, especially as one cannot permenantly add it while running a live DVD image, is definitely a problem in my book.

Perhaps the biggest problem with the verion 4 of Bio-Linux is that it is out of date, having been released in 2005 and running kernel 2.6.12. The userland applications, similarly, are older versions, from OpenOffice.org 2.0 beta to the scientific software (the included version of Taverna was 1.4, whereas 1.7 is current).

NEBC Bio-Linux 5 beta

I had hoped that the new version 5 beta, which was announced in July and which is based on Ubuntu 8.04, would fix the issues that I had with version 4, but it's no panacea. While Bio-Linux 5 beta's Ubuntu base is much easier to deal with than the three-year-old version of Knoppix, which leads to a much cleaner install process, the system has its own issues.

For instance, unlike Bio-Linux 4, version 5 has yet to integrate the bioinformatics tools into the application menu. That means if you want to use the tools, you must know their command names. At least the developers stuck them all in one directory -- /usr/local/bioinf/.

Version 5 currently fails to include Java, even though a lot of bioinformatics software, including Jemboss and Taverna, is Java-based. Of course, one can install Java from repositories, but that has its own host of problems, first of which is the fact that the bash script used to run Jemboss is hard-coded to look for Java at /usr/local/bin/java/, while repositories install it to /usr/bin/java/. This makes using the distribution as a live CD next to impossible if you need any of the Java-based suites.

Bio-Linux 4's live DVD image comes in at 1.9GB. Bio-Linux 5 Beta weighs in at a hefty 2.1GB. There is really no need for this -- what need is there for GNOME Games, for instance, on a laboratory computer?

It seems that while NEBC Bio-Linux is a laudable endeavor that does address certain needs for the community at which its targeted, it comes up short in a number of ways. While attempting to provide an easy-to-use, canned solution, it stumbles in many areas, from the application process for the "standard" installation to the various issues with the live DVD.

As most of the bioinformatics software included in Bio-Linux is available in the repositories for Ubuntu and other systems, the advantage of a dedicated distribution over installing that software by oneself on a "normal" Linux system is slim, at best. The two major applications that are not readily available in repositories, Jemboss and Taverna, can both be quickly installed by hand by anyone who has a basic familiarity with extracting a tarball and making a shell script executable. Both programs are written in Java and are started by bash scripts, and as an added bonus -- the stock Jemboss script isn't hard-coded with the location of the Java interpreter, unlike Bio-Linux's.

With new tools for creating live Linux images (especially for Fedora) and the ability to streamline what is included, there is really no reason a future version of NEBC Bio-Linux, or a similar project from another source, could not be made leaner and more focused. I hope that by the time that NEBC Bio-Linux 5 is actually released, the developers have at least remedied the situation with their Jemboss script and recreated the menu so that the bioinformatics tools that are included are more readily accessible.

Categories:

  • Distributions
  • Linux
  • Reviews
  • Bio-Linux