
This is intended to give someone new to the Semantic Web a
basic overview of the technologies involved, and a guide to
where to go to find out more.
The basis for the augmented functionality of the Semantic Web
is
If any Semantic Web application is to be able to access and
use data from any other such application, every data object and
every data schema/model must have a unique and universal means
of identification. These identifiers are called URIs (Universal
Resource Identifiers).
top
The computer industry has agreed, by and large, to use
XML
(Extensible Markup Language) to represent not only human
readable documents, but data in general. The XML standards give
a syntactic structure for describing data. Unfortunately, XML
can be used in many different ways to describe the same data.
This makes it too open and arbitrary to support the type of
widespread and ad hoc data integration envisaged for the
Semantic Web. The semantic web vision proposes to represent
machine processable information using RDF (Resource Description
Framework), which extends XML. RDF defines a general common
data model that adheres to web principles. The W3C are strong
supporters of this approach.
RDF provides a consistent, standardised way of describing and
querying internet resources, from text pages and graphics to
audio files and video clips. It gives syntactic
interoperability, and provides the base layer for building a
Semantic Web. RDF defines a directed graph of relationships.
These are represented by object-attribute-value triples i.e. an
object O has an attribute A with value
V, often written as A(O,V). For instance,
telnet(janet_bruten, 3128700) represents the fact that the
person object Janet Bruten has the telnet number 312-8700.
Figure 1: A simple directed graph
Further
information
top
RDF itself is a composable and extensible standard for
building data models. To support the definition of a specific
vocabulary for a data model, which can itself be published,
another layer is required. RDF schema allows a
designer to define and publish the vocabulary used by an RDF
data model, i.e define the data objects and their attributes.
For instance, it might define that people have a phone
attribute. RDFS also uses class and subclass,
so that hp_employee could be defined as a sub-class of
person.
Both RDF and RDF-Schema are based on XML and XML-Schema. The
existence of standards for describing data (RDF) and data
attributes (RDF Schema) enables the development of a set of
readily available tools to read and exploit data from multiple
sources. The degree to which different applications can share
and exploit data is sometimes termed syntactic
interoperability. The more standardised and widespread
these data manipulation tools are the higher the degree of
syntactic interoperability, and the easier and more attractive
it becomes to use the Semantic Web approach as opposed to a
point solution.
Further
Information
top
If data is to be truly 'understandable' by multiple
applications, and therefore become information, semantic
interoperability is required. Syntactic
interoperability is all about parsing data correctly. Semantic
interoperability requires mapping between terms, which in turn
requires content analysis. This requires formal and explicit
specifications of domain models, which define the terms used and
their relationships. Such formal domain models are sometimes
called ontologies. Ontologies define data
models in terms of classes, subclasses, and properties. For
instance, we might define a herbivore to be a subclass of
animals that eats plants. Figure 2 shows a very simple example
ontology for animals.
Figure 2: An example ontology
Over the years a vast amount of research has been carried on
how to represent and reason about knowledge. In Europe funding
has been heavily concentrated on the development of OIL
(Ontology Inference Layer), a language for defining ontologies.
In the US, DARPA funded a somewhat similar project called
DAML
(Distributed Agent Markup Language). More recently these
activities have been combined into a project to work on a merged
ontology language, DAML+OIL.
In late 2001 the W3C set up a working group called WebOnt to
define an ontology language for the Web, based on DAML+OIL. All
of these ontology languages aim to provide developers with a way
to formally define a shared conceptualisation of a domain. They
encompass both a means of representing the domain and a means of
reasoning about that representation, typically by means of a
formal logic. In the case of DAML+OIL this is Description
Logic.
Further
information
top
If the Semantic Web is indeed to become a global database,
and if its development is evolutionary and distributed, then
there are issues of accessibility, trust and credibility. Not
all data sources will have universal access, so there needs to
be a robust and extensible security model. Not all data sources
will be equally reliable. If instead of just returning an answer
to a query a Semantic Web application could also attach a proof
of how that answer was derived, then the querying application
could potentially do some reasoning about how 'believable' that
fact is. At the very least, derived facts could be attributed to
a source, and over time applications could be developed which
rate sources as to their integrity etc. These upper layers of
the stack are the least researched and present some of the most
difficult technical challenges faced by the Semantic Web
venture.
top

|