Open Source, Open Research


Colleges and universities are as much about research as they are about the classroom experience, and just as open source software can provide cost savings, independence, and flexibility to educational institutions through courseware and recordkeeping, it can assist in the research process. Open source and open data standards play a role in collaboration, laboratory and literary scholarly research, publishing, and managing the overall research programs at institutions of higher learning.

References for Individuals and Institutions

Naturally, the role that software of any type plays in academic research varies greatly with the field of study. Software itself is at the center of computer science and physics simulation research, for example, while it plays more auxiliary roles such as data mining and statistical reporting in the social sciences. Literary criticism and history depends heavily on library work, where open source can provide important infrastructure.

Regardless of the field, however, collecting, studying, and tracking the work of predecessors and contemporaries is always critical, as evidenced by the large number of open source projects designed to help track sources and manage bibliographical resources. The simplest applications, such as I, Librarian, help the researcher collect, organize, and annotate reference material. Most such applications allow full-text and metadata searching of collected resources, and can be used to directly search online catalogs from publishers such as PubMed and arXiv, and major online reference sites like JSTOR and ARTstor, or Google Book Search and Google Scholar.

Full-featured reference management applications include Web-based tools like Aigaion, Connotea, and Wikindx, desktop applications like Referencer, Pybliographer, JabRef, and BibDesk, and specialty tools such as the Firefox extension Zotero and Bibus, which is designed to integrate with OpenOffice and with Microsoft Word. Most of these reference management applications add features beyond cataloging and searching, such as the ability to share collections with team members, manage multiple collections as projects, and the ability to automatically generate formatted citations for inclusion in a paper, in general BibTeX or EndNote format as well as in the preferred styles of major journals.

Several applications take the additional step of emphasizing the Web publication of collected bibliographies. BibCiter, refbase, and RefDB can be used to build online indices. Refbase and RefDB can be used to create searchable, personal archives of an author’s own work–a practice known as self-archiving–as well as larger institutional repositories collecting the research of individuals from entire organizations.

Open Repositories and Open Access

Self-archiving is endorsed by more than 90 percent of the academic journals surveyed by the Securing a Hybrid Environment for Research Preservation and Access (SHERPA) project. In most of the surveyed journals, authors of a peer-reviewed paper can self-archive pre-print versions of their work without seeking the publisher’s permission, but must obtain permission to self-archived post-print versions.

For individuals and smaller departments, self-archiving tools like Refbase and RefDB may be sufficient, but at the institutional scale–particularly for institutions that publish in large volume–a more focused package like EPrints might be more suitable. EPrints is a Web-based platform for creating digital repositories, explicitly designed to implement the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH), a standard created by the Open Archives Initiative (OAI) to facilitate interoperability between digital repositories. Naturally, many digital library and collections tools, such as DSpace, are useful to institutions collecting and publishing their own work, but EPrints is notable because of its close association with the open access movement.

OAI and other open access participants encourage self-archiving and institutional repositories in order to preserve free online access to journal articles and conference proceedings, but also push journal publishers to adopt open access policies toward their own digital archives. That could take the form of a completely open access journal, or a partially-open access journal that delays online publication of its content for a period of time, during which access is exclusive to paying customers. There are many academic groups involved in the open access movement; the Directory of Open Access Journals is a good starting point to find resources.

Open Data

Historically, when researchers at one institution wanted to review or examine the raw data behind the published work of others, the only way to do so was through person-to-person contact, followed by institutional approval–a potentially time-consuming process. Several groups are working to simplify the sharing of raw data between research projects to foster scholarly collaboration.

The Software Environment for the Advancement of Scholarly Research (SEASR) project is a framework for humanities and social science researchers, allowing scholars to publish their work in standardized formats, to analyze and mine the data in other published research materials, and to correctly transform structured data between published formats. SEASR produces a Web-based software stack called Meandre to facilitate publishing and “dataflow” applications.

Science Commons (SC) is a project attempting to facilitate collaborative scientific research. As the name suggests, SC is organized by the same group that runs the Creative Commons project, although it is a separate initiative. Like Creative Commons, one of SC’s most significant advancements is in developing legal and policy frameworks that make it easier for interested individuals and institutions to share their work. SC promotes open access by encouraging publishing of research under Creative Commons licenses, and the Scholar’s Copyright Addendum Engine (SCAE) is a tool to assist self-archiving by automatically generating the legal forms to authorize publication on the Internet.

SC also maintains the Open Access Data Protocol, which is not a computer protocol but a “best practices” policy for institutions to publish accessible scientific databases. SC has put its tools to the test in two public projects, the Neurocommons repository of open access data and publications from neuroscience, and the Health Commons, which does the same for pharmaceutical and biological research.

Research Administration and Compliance

An important SC project of a different sort is the Biological Materials Transfer Project, which assists researchers by providing a standardized set of contracts for transferring physical biological specimens between research groups and institutions–specimens such as DNA, cell lines, and even animals. Properly managing biological materials is critical not just for scientific reasons, but for the legal and regulatory requirements of the institution.

Large research universities can spend significant administrative and technical overhead in managing the legal and governmental obligations of their research, not to mention the budgeting, proposal management and review, and other processes required to oversee multiple projects.

The Kuali Foundation is a non-profit organization that develops open source administrative software for educational institutions. Its primary project was a financial information system, but one of its newer efforts is Kuali Coeus (KC), a full-fledged research administration system for universities. KC is adapted from MIT’s Coeus system, and although it is still under development, already handles proposals, budgeting, and integration with the US government’s Web site. The next release is scheduled to include managing institutional review boards, human participants, and conflict of interest issues.

KC is the only large-scale open source research administration project for educational institutions, but there is work in other areas that might be applicable, depending on the research. For example, Akaza Research’s OpenClinica is an open source system for managing clinical health trials, and although it is designed for medical institutions, many of the same issues are felt by medical research conducted at universities.

KC is also the only open source project that integrates any sort of grants management, albeit it limited form. Grants management packages that facilitate application and compliance with federal, state, and local government grants, as well as private grants, are often expensive purchases from the proprietary software industry. With millions of dollars at stake, most educational institutions choose to pony up the annual licensing fees rather than risk the loss of funds from a mismanaged grant application.