October 18, 2007

Fedora - not that one - provides platform for interoperability

Author: Mayank Sharma

There's a wealth of information stored in online collaborative services like YouTube, Flickr, and Wikipedia, but are these Web 2.0 services built to facilitate sharing their content across their individual boundaries? A group of academicians at Cornell University argue that this new wave of applications should be constructed with interoperability in mind. The result of their research, funded by DARPA and NSF, is Fedora, the Flexible Extensible Digital Object Repository Architecture. The project was recently awarded a $4.9M grant by the Gordon and Betty Moore Foundation to expand the functionality of its software platform.

As per the proposal document, "the overall outcome for the Fedora Commons proposal is to enable the organizational and technical frameworks necessary for sustainable open source software to support revolutionary change in how scientists, scholars, and educators produce and share their intellectual outputs, and ensure the integrity and longevity of information."

If this is the first time you're hearing of this Fedora software, it's probably because of its specialized nature. Fedora is used by various educational and cultural institutions around the world, including the Topaz/PLoS ONE open access journal system, the National Science Digital Library (NSDL), Max Planck Society's e-scholarship system, the Chicago Historical Society's multimedia encyclopedia, the Australian national institutional repository initiative (ARROW), Oxford University's digital archive, and the Perseus humanities computing project. The project has also conducted six Fedora User Conferences over the past two years in the US, Europe, and Australia.

So what exactly is a digital repository?

According to the project's proposal to the foundation, the goal of the project was to devise new information architectures to facilitate interoperable access and management of increasingly complex and heterogeneous digital collections. Without referring to specific projects, the same document says that with each new project devising its own idiosyncratic solution for dealing with digital material, the possible future was an unmanageable set of stovepipe systems that would inhibit interconnection of information and endanger the long-term welfare of digital resources. Thus, a primary goal of the initial Fedora architecture was to design a uniform "digital object" model that could represent the full variety of digital content, and a generalized repository model for consistent access to and management of content.

"I would hesitate to call Fedora a 'digital library solution' per se," says Carl Lagoze, senior research associate at Cornell's Computing and Information Science department and a member of the Fedora-Commons board that develops Fedora. "More appropriately, I would call it a service-oriented architecture that combines content management, semantic knowledge management, and Web services integration. It is used as a foundation for digital library applications, but also for other Web 2.0 applications."

As per Peter Murray, assistant director of new service development at OhioLINK, library management systems like Koha and Evergreen focus on physical objects -- the purchase, cataloging, discovery, and loaning of books, DVDs, and magazines. To understand where Fedora comes into the picture, Murray says that one has to understand that the front end interface of specialized systems like YouTube and Flickr is closely bundled with the back end content repository. One cannot store pictures in YouTube and videos in Flickr. "In the academic world, this is similar to the DSpaceInstitutional Repository software," Murray says.

"Fedora is a pure content repository service, with the key notion here that it is a service to other applications, not necessarily end users. Fedora will store and retrieve just about any kind of digital object. It relieves application developers from the hassles of managing digital objects. Fedora, in fact, could be an underlying component of systems as diverse as a image digital repository, a video digital repository, a wiki, a blog, or a journal delivery system."

No real alternatives

If you believe Murray, Fedora is unique in its approach to being a content repository. He argues that other content repository systems don't take the same approach to the long-term storage and preservation of digital data for a wide variety of use cases. "Fedora's flexibility means that it can handle just about any kind of digital data. Its extensibility comes from its ability to bring new behaviors to that digital data -- make an image object return a thumbnail of itself, for instance. And its architecture is such that it can be used in standalone situations or as a component of a service-oriented architecture."

Murray says that while DSpace, another open source content repository project, is as flexible as Fedora in dealing with different forms of digital content, it lacks built-in extensibility. "Apache Jackrabbit is similar to Fedora," Murray says, "but as a Java library it is limited somewhat to that programming language."

Finally, Lagoze points to some of Fedora's unique features, starting with the REST and SOAP APIs that allows developers to control Fedora repositories using any programming language. Fedora can also associate Web service applications with data objects for dynamic dissemination of content. It has full version control management over all aspects of the data model, and does base-level storage in a simple XML format called FOXML.

Using the grant

Lagoze thinks the foundation's grant gives the project the legs it needs to take Fedora beyond the current open source distribution in a number of ways. It'll help the project develop its nascent community. Lagoze also mentions expanding Fedora's functionality in two key areas. "One is tighter integration and scaling of semantic technologies, which is essential to build Web 2.0 applications that mash up lots of varied distributed content. The other is enterprise-level scaling and reliability and standardization."

"Our goal," Lagoze says, "is really to combine these two areas so we can all move beyond applications-level ad hoc social network collaborative environments to ones that follow a more open standards approach to promote sustainability of the information contained and its cross-application portability. Our belief is that more domains, such as scholarship, will follow the blog/wiki social application paradigm, and we want systems that are not one-offs, but ones that pay intention to the integrity demands of these domains."

Murray believes that the grant will assist Fedora in finding new developers and users. Despite getting its start in academics, he thinks the project has many uses beyond universities, libraries, and research centers.


  • News
  • Internet & WWW
Click Here!