August 26, 2008

Nepomuk and KDE to introduce the semantic desktop

Author: Bruce Byfield

If you follow technology trends, you have probably heard of the semantic desktop -- a data layer for annotating and sharing the information in your computer. But what you may not be aware of is that the semantic desktop is not a distant goal, but scheduled to arrive at the end of 2008. And, when it does, the idea will probably be implemented through the work done by the Nepomuk project, and, most likely, by KDE first.

Ansgar Bernardi, deputy head of the Knowledge Management Department at Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI, or the German Research Center for Artificial Intelligence) and Nepomuk's coordinator, explains, "The basic problem that we all face nowadays is how to handle vast amounts of information at a sensible rate." According to Bernardi, Nepomuk takes a traditional approach by creating a meta-data layer with well-defined elements that services can be built upon to create and manipulate the information.

"The first idea of building the semantic desktop arose from the fact that one of our colleagues could not remember the girlfriends of his friends," Bernardi says, more than half-seriously. "Because they kept changing -- you know how it is. The point is, you have a vast amount of information on your desktop, hidden in files, hidden in emails, hidden in the names and structures of your folders. Nepomuk gives a standard way to handle such information."

Bernardi adds, "This is something that, conceptually, is nothing very new." He points out that office programs have had aspects of the semantic desktop for years, For example, if an email schedules an appointment, in some office programs, you can have the appointment automatically added to your calendar. What is different about Nepomuk is that it extends this inter-relatedness to all your files, and greatly expands how you can organize and manipulate the information for your own purposes.

"In terms of usability and public impact, it is of great interest," Bernardi says. If anything, he understates the case.

The Nepomuk project

Nepomuk began at the start of 2006, with €11,500,000 from the European Union. Today, it has 16 partners, including Hewlett-Packard Galway, IBM, and Edge-IT (a subsidiary of Mandriva).

At a high level of generalization, Nepomuk has three main aspects, according to Bernardi. First, there is a standard framework for annotating pieces of information so that connections can be made between them. Second, there are ontologies, the sets of "documented shared understanding" or common concepts that can be defined for particular types of information, such as bio-science or computer desktop use. Finally, there are the tools for making or using the annotations and ontologies, what Bernardi calls the "workspaces that connect to other workspaces and help you in your day to day activities of collecting information, structuring it, making sense of it, and creating new information and communicating it."

Together, these aspects will affect all levels of users. The first time you use a Nepomuk-supported desktop, the data sources you choose will be indexed, and you can add additional annotations later. At least part of Nepomuk's functionality will be based on Lucene, the Java search engine.

For everyday users, the semantic desktop will allow enhanced searching of the type that has already been introduced by applications like Beagle, but with more user control than any existing program. For more sophisticated users, such as academic researchers, while Nepomuk will not actual do data-mining itself, it will assist such processes. For example, a researcher writing a standard lab report could gather much of the necessary information by having it entered automatically based on file annotations.

On whatever level Nepomuk is used, it will be accompanied by standard security protocols -- "permissions on various levels, encryption, and key management," Bernardi says, and most likely identity management and controls for sharing information. Not only should a Nepomuk user be able to control what personal information is shared, but they should also be able to create common views, or shared annotations, so that the same research on, say, the semantic desktop, could be shared with both artificial intelligence researchers and web developers without anyone having to alter the information depending on who received it. Bernardi insists that the security will be simple to use -- "otherwise, nobody will use it," he says.

Bernardi expects that the high level components of Nepomuk will be released under a variety of licenses -- although exactly which ones has not been decided. The basic framework will most likely be released under a BSD or GPL license. By contrast, while some ontologies, such as calendar entries, will probably be released under a free license, others are likely to be proprietary so that Nepomuk partners can protect so-called intellectual property. The same is likely true of the user interfaces for Nepomuk.

The Nepomuk project is scheduled to conclude at the end of 2008. Bernardi says that the project is currently on track. There was a feature freeze in June, and integration tests are now underway. User testing is scheduled for September, and will be followed by decisions about licensing.

After the project is completed, development will continue in a number of venues. DKFI has a history of successful spinoff companies from its research -- 49 in 20 years -- and is likely to start one that involves Nepomuk. At least some of Nepomuk's participating companies are likely to continue their own research, and open development is likely to occur at KDE and at A Semantic Desktop Foundation is also being discussed.

Nepomuk and KDE

KDE's involvement with Nepomuk is due to Sebastian Trüg, who is best known for his development of K3b, the popular CD/DVD burner. After Trüg received his computer science degree, he was hired by Edge-IT specifically to work on implementing Nepomuk in KDE.

Trüg explains that adding Nepomuk is a key goal of KDE 4. "There is even Nepomuk technology in 4.0," he explains, "but not much for the user's eye. So far we only use it for tags, ratings, and comments, as well as metadata cached from files for fast searches. In 4.1, you only see the tagging and rating interfaces in apps like Dolphin or Gwenview," he says, referring to KDE's new file manager and image viewer."

However, KDE is planning to extend the use of Nepomuk considerably in the next few releases. Already, Trüg says, "experimental tools exist that allow you to relate people to files (pictures for example) or other people, or to tag Web sites. But in the future [Neomuk] is supposed to be combined with way more applications. Important, of course, are the KDE_PIM applications, such as KMail and KAddressbook. KMail developers, for example, have plans to provide virtual email folders through Nepomuk. Also, a service will extract contacts from KDE-PIM and link them to emails, IM accounts and dates, and thus provide an abstract view of the people you know."

With these developments in place, Trüg continues, "There should be no distinction for you between an email address and an IM account. You should be able to ask for emails from the guy you are chatting with without having to search through all your mails. Nepomuk should gather the different email addresses he used to send you emails from, for example.

"There are so many ideas in my head (and not only in mine)," Trüg laments. "But so far there are not enough developers." To help change that situation, he has written a series of tutorials to help developers to start writing Nepomuk-oriented code.

After the semantic desktop, the semantic Web?

"Nepomuk is a nice tool to have," Bernardi says, "so it's an end in itself. Certainly, it offers new ways of building applications, and we certainly hope that will fly."

However, Bernardi's strongest hope is that, besides having the potential to change the way people use their personal files, Nepomuk will also also lead to the successful implementation of the semantic Web -- the semantic desktop on the next logical level.

Right now, Bernardi, explains, the semantic Web faces "a chicken and egg problem." The vision of the semantic Web is to annotate all the information on the Internet in order to create a new level of services. However, nobody is creating the services, because the annotations are not available to make the services useful.

Bernardi's hope is that Nepomuk might provide a way to end this dilemma. "Nepomuk might provide an answer," he says, "because it provides an environment that makes it easy to annotate your information, and because it helps you in maintaining your information. Which means we have a very personal motivation to annotate your information. And then we have the initial starting point."

Whether Nepomuk will have all the implications that its developers hope is impossible to say right now. However, starting some time in 2009, we should have a chance to find out.


  • News
  • Free Software
  • Desktop Software
Click Here!