Knut-Olav Hoven: Layers of referentiality

Most computer systems interact with other systems in some way, consuming or providing data, or maybe both. Referential resources is the core of the Semantic Web.

There exists many ways to identify a piece of data. In a database, in a table, a row represents an entity of data, and that entity is identified with a value, most likely by a number or short string value. These kinds of IDs are extremely local to the system owning the data. They provide no meaning outside that they are some kind of IDs for something, and if exported from the system they become nothing but noise.

IDs exist on different levels, from the most simple IDs of a database entity, to globally unique resolvable resource identifiers, such as a URL. I present the layers of referentiality. Each layer up the stack provides a little more abstraction and a little more context to the identifier.

Layer	Referentiality	Identifier	Usage
Semantic Web	Globally unique resolvable ID	URL	Identifier of web resources. Provides information about how to access a resource.
Integration	Globally unique ID	URN/URI	Identifier of system resources. Contains a system identifer. Might be a URL. Should never change.
Domain	System local ID	URN	Identifier of domain concepts. References object type and id.
Persistence	Entity ID	Alphanumerical	Identifier of table rows, document file names.

An ID on the domain layer doesn't have to be a concatenation of the object type name and the primary key of the enitity, and should probably not be. A good ID on the domain layer is something that the domain experts (the users of your system) can relate to. And that is most likely not some automatically incremented ID in a database table.

Pay special attention to globally unique IDs of the integration layer – they should never change. URLs to websites and webservices are known to change often. How many times have you tried to access a URL just to find out it was a broken link? You should of course take precautions to avoid breaking the web, but some times it's inevitable for some good or bad reason.

A resource representation, for example an HTML document, might be retrieved in different ways. Probably the most obvious way to retrieve an HTML document is to make an HTTP GET request to a web server, resolved from a URL. Another way to retrieve an HTML document is to retrieve it from a database or file system, and then the URL from the first example becomes pretty useless.

You should focus on defining a strong namespace for your resources, to provide resource identifiers that are globally unique and very likely never to change. From a strong globally unique ID, in time, the Semantic Web will reveal itself.

Knut-Olav Hoven

Sunday, October 2, 2011

Layers of referentiality

No comments:

Post a Comment

Blog Archive

Followers

Links