The Semantic Web

23

Author: Daniel Rubio

The World Wide Web was conceived as a medium for people to easily share information. With its wide adoption, it was only natural that it would be used for accessing and retrieving data by machine-based processes. In an intent to standardize a common approach to programmatic consumption of Web-based information, the World Wide Web Consortium has been working toward the new generation Web, dubbed the Semantic Web.

By its very nature, the Web’s most pervasive language for creating documentation — HTML — is designed for display purposes, making it a difficult format to be used by applications. A simple analysis of the retrieval and classification processes undertaken by any search engine illustrates the complex issues arising from this task.

The underpinnings for the Semantic Web are two specifications also under the umbrella of the W3C: The Resource Description Framework (RDF) and the Web Ontology Language (OWL), both of which are based on XML.

RDF’s primary goal is to achieve a common standard for representing meta data. Even though current mechanisms exist for such tasks, such as the HTML meta tag, which allows the placement of keyword and description text within a document, they fall short of providing a comprehensive structure for defining Web resources, especially for machine-based processing.

RDF’s foundations are statements which comprise a subject, a predicate, and an object, each with an unequivocal Uniform Resource Identifier (URI). The triple value definition allows for a behavioural type declaration, while the use of URIs permits a value to be associated with any resource, whether network-accessible or not, such as a corporation or human entity. This in contrast to a URL, which is a particular type of URI used in Web documents to define a network location.

The following fragment represents a simple RDF snippet describing a product associated with a Web document:

 

   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
               xmlns:terms="http://www.baseballcaps.com/products/
               xmlns:reviews="http://www.baseballcaps.com/reviews/">

    <rdf:Description rdf:ID="item100">
          <terms:model rdf:datatype="xsd:string">San Francisco Giants</terms:model>
          <terms:retailprice rdf:datatype="xsd:integer">25</terms:retailprice>
          <terms:weight rdf:datatype="xsd:decimal">0.5</terms:weight>
          <reviews:ratingBy rdf:datatype="xsd:string">Biggest Giants Fan</reviews:ratingBy>
          <reviews:stars rdf:datatype="xsd:integer">5</reviews:stars>
    </rdf:Description>

  </rdf:RDF>

RDF declarations can be placed within an HTML document in the following manner, so they can be processed by RDF-enabled applications:

 

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
  <head>
    <title>My Semantic Web Document</title>
    <meta http-equiv="Content-type" content='text/html; charset="utf-8"' />
    <link rel="alternate" type="application/rdf+xml" href="mymetadata.rdf" /> 
  </head>
  <body>
    <h1>My Semantic Web Document</h1>

As with any other XML-based language, RDF has its own Schema, used for defining the permissible elements within a document. Building upon the RDF Schema is OWL, a separate albeit complementary specification to RDF which adds features for describing more complex associations such as class definitions and their relations among each other, such as cardinality and equality.

OWL has three incarnations: OWL Lite, OWL DL, and OWL Full, each one offering a richer subset of features in their respective order. All the versions allow for the representation of terms and their interrelationships, or what is also known as an ontology.

Although a comprehensive example of OWL would go beyond the scope of this article, the following snippet illustrates its capabilities:

 

<owl:Class rdf:ID="BaseballCaps"> 

  <rdfs:subClassOf rdf:resource="sports:BaseballParaphernalia"/> 

  <rdfs:subClassOf>
     <owl:Restriction>
       <owl:onProperty rdf:resource="#hasTeam" />
       <owl:allValuesFrom rdf:resource="#MLB"/>
     </owl:Restriction>
  </rdfs:subClassOf>


</owl:Class> 

Notice how the series of nested structures allows the definition of a hierarchy along with dependencies: BaseballCaps as a sub-class of BaseballParaphernalia and the restriction of a value defined within the #MLB property. This type of declaration permits information of a similar nature to be correlated with ease, alleviating the need for a complex algorithm design to extract the actual meaning and relationship among Web-based information.

Although search engine services will still play a big role in locating information for people across the Web, the foundations set by the Semantic Web will lower the entry barrier for applications to utilize a wide array of software agents for the purposes of information retrieval and mining.

Daniel Rubio is the principal consultant at Osmosis Latina, a firm specializing in enterprise software development, training, and consulting based in Mexico.

Category:

  • Web Development