October 25, 2011

KDE 4: Leader of the Semantic Pack

Semantic computing is the future of computing, and KDE4 has the only working implementation of a semantic desktop. If you want an example of where Linux and FOSS are taking the lead, this is a great one.

KDE4 Semantic Desktop

KDE4 has been flinging new technologies at us fast and furiously, many in the category of "semantic desktop." Semantic computing is the future of computing, and KDE4 has the only working implementation of a semantic desktop. The idea of a semantic Web has been kicking around for some years, and the semantic desktop is the same concept applied to the personal computer. Let's take a look and see what all the fuss is about. It starts with Nepomuk, Strigi, and Akonadi, which are odd names that have been bandied about much with little understanding.


Nepomuk (Networked Environment for Personal, Ontology-based Management of Unified Knowledge) is the KDE4 metadata library. It has a two-fold purpose, as the Nepomuk flyer says:

  • Semantic: it makes knowledge processable by the computer.
  • Social: it supports the interconnection and exchange with other desktops and their users.

The idea is to make the computer form the types of associations that we make every day. I'll wager everyone has an Auntie Em or Uncle Henry who bootstraps their memories in this fashion: "When did Cousin Ellen lose her mind and go to work for the Infernal Revenue, instead of keeping her honest accounting job? Well, that was back when Grandpa Jones broke his hand and couldn't play the mandolin anymore, and Fred and Ethel just had their third baby and had to sell the sports car, and that baby boy is all grown up now and just had his 23rd birthday, so it was back in September 1988 when Ellie went to work for the bad guys."

Nepomuk aims to make possible the same web of associations to our files and their contents that we make in our heads all the time. The possibilities are vast: what if Nepomuk could catalogue audio file lyrics, recognize song snippets, incorporate image-recognition, figure out who the musicians in a song or the actors in a movie are? It's not so far-fetched as Google and others are working on these technologies now.

Making knowledge processable by the computer, making computers form associations and relations between bits of data the way we do is a big task. The first baby step in this direction is metadata. Some metadata is automatic, like the EXIF data in photographs, and filesystem metadata. This is fixed and inflexible, so an increasing number of applications support user-added metadata, like the comments and star ratings in photo management programs.

Nepomuk manages two types of metadata: user-created, such as tags, star ratings, and comments, and metadata that it searches out and indexes automatically. This is the tricky part where it tries to emulate the human brain. Some examples of this are the URL of a downloaded file, the email that an attachment came from, the original source of a copied file, information in calendar entries like people and events and how they are related to each to each other, files related to people in your address book, companies that these people work for, various bits of information inside documents — see how it works? Relying on users to tag and annotate their files only goes so far. In fact some of us are extremely impaired at tagging even though we understand the benefits. Figuring out how to make the computer do it is a big step forward, and it makes our own annotated metadata more useful.

Another feature of the semantic desktop is abstracting search and content away from the filesystem. The hierarchical filesystem doesn't serve us so well for finding things when we're storing hundreds of gigabytes or terabytes of data. The semantic desktop also abstracts content away from files. There is nothing sacred about the computer file; it's the contents of the file that are valuable, and how often do we need information that is scattered amongst several files, and several different types of files? For me, all the time.

Nepomuk uses Soprano for storage management. Soprano is the Qt library for accessing Resource Description Framework (RDF) data. RDF is a family of data interchange formats used in the semantic Web. There are three backends for Soprano: Redland, Sesame2, and Virtuoso. Redland and Sesame2 are not used anymore because they are too slow. You can see your Virtuoso database in ~/.kde/share/apps/nepomuk/repository/main/data/virtuosobackend.

Nepomuk runs as several services, which you can see by running ps ax|grep nepomuk.

The original Nepomuk project, launched in 2005, was developed with funding from the European Union and other sources to the tune of 17 million Euros, or about $23.5 million dollars. Funding ended at the end of 2008, and since then Nepomuk-KDE has continued development for the PC desktop. (There are other offshoots that you can read about on Wikipedia.)


Strigi is the indexer that crawls your system extracting semantic data and doing deep indexing on every file for Nepomuk. It is billed as being lightweight, fast, and extensible with plugins. Strigi analyzes all kinds of information in files, such as photo attributes and contents of text files. It computes a SHA-1 hash for every file for quickly finding duplicates.

Strigi runs as the nepomukstrigiservice service. The first run is always the longest, and then after that it indexes only changes. Nepomuk and Strigi are configured in System Settings > Desktop Search. You can turn them on and off, configure Nepomuk backups, and limit the amount of memory Nepomuk can use.


The Akonadi storage service manages your PIM (personal information manager) data: address book, calendar, and emails. It provides a common platform for sharing all this data between different applications, and it can be an offline cache for online services like IMAP. Akonadi manages data retrieval and storage, replacing the old method of every application communicating directly with servers and managing its own storage. Moving these functions into a common PIM backend makes life easier for developers, and there are user benefits too, because any Akonadi-aware client should be able to access your mail store, contacts, addressbook — all your PIM data. Contrast this with the pain of migrating to different clients even when you're using standard storage formats like maildir, mbox, and Vcard.

Akonadi data are stored in a MySQL database. Akonadi agents communicate PIM metadata into Nepomuk, and then it becomes part of your central Nepomuk store and thus all part of your same fast, detailed semantic search engine. This is not magic, but detailed and specific work as you can see in your ps ax| grep akonadi output. You'll see agents like /usr/bin/akonadi_ical_resource, /usr/bin/akonadi_nepomuk_contact_feeder, and /usr/bin/akonadi_maildir_resource.


Recall the quotation from the Nepomuk flyer: "Social: it supports the interconnection and exchange with other desktops and their users". This creeps me out because it seems the entire world wants to snoop in my stuff, for example Google and Facebook make billions of dollars mining and selling our personal data. But there are times I want to share something with other people, so why not make it better and easier? Surely the future holds something better than attaching everything to email. A better-integrated desktop system also seems like a good thing, moving the tasks of searching, indexing, and accessing data into a common framework, rather than reinventing it for every application.

For all the hype about the semantic Web and the semantic desktop, the only real progress is happening in KDE4. Apple and Windows aren't doing much, so if you want an example of where Linux and FOSS are taking the lead, this is a great one.

Click Here!