Sunday, October 2, 2011

Layers of referentiality

Most computer systems interact with other systems in some way, consuming or providing data, or maybe both. Referential resources is the core of the Semantic Web.

There exists many ways to identify a piece of data. In a database, in a table, a row represents an entity of data, and that entity is identified with a value, most likely by a number or short string value. These kinds of IDs are extremely local to the system owning the data. They provide no meaning outside that they are some kind of IDs for something, and if exported from the system they become nothing but noise.

IDs exist on different levels, from the most simple IDs of a database entity, to globally unique resolvable resource identifiers, such as a URL. I present the layers of referentiality. Each layer up the stack provides a little more abstraction and a little more context to the identifier.

Layer Referentiality Identifier Usage
Semantic Web Globally unique resolvable ID URL Identifier of web resources. Provides information about how to access a resource.
Integration Globally unique ID URN/URI Identifier of system resources. Contains a system identifer. Might be a URL. Should never change.
Domain System local ID URN Identifier of domain concepts. References object type and id.
Persistence Entity ID Alphanumerical Identifier of table rows, document file names.

An ID on the domain layer doesn't have to be a concatenation of the object type name and the primary key of the enitity, and should probably not be. A good ID on the domain layer is something that the domain experts (the users of your system) can relate to. And that is most likely not some automatically incremented ID in a database table.

Pay special attention to globally unique IDs of the integration layer – they should never change. URLs to websites and webservices are known to change often. How many times have you tried to access a URL just to find out it was a broken link? You should of course take precautions to avoid breaking the web, but some times it's inevitable for some good or bad reason.

A resource representation, for example an HTML document, might be retrieved in different ways. Probably the most obvious way to retrieve an HTML document is to make an HTTP GET request to a web server, resolved from a URL. Another way to retrieve an HTML document is to retrieve it from a database or file system, and then the URL from the first example becomes pretty useless.

You should focus on defining a strong namespace for your resources, to provide resource identifiers that are globally unique and very likely never to change. From a strong globally unique ID, in time, the Semantic Web will reveal itself.