Sunday, December 7, 2008

Semantic Web Vocabularies for the Ancient World

As previously indicated, I'm working on an xhtml+rdfa representation of GRBPIlion.

At some point I will give a more general statement of why this is a good idea. Right now, I'm still very much in the planning/modeling phase. In particular, I'm interested in which pre-existing vocabularies I should be using. What follows is a lightly annotated list of potential candidates, some more obvious and stable than others.

General statements of properties and relationships:
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:skos="http://www.w3.org/2004/02/skos/core#"
xmlns:ov="http://open.vocab.org/terms/"

The Dublin Core is a well understood and widely used standard. Where it matches, it's a no-brainer to use it as a default. Currently, each record in the db has "dc:title" as a human readable title. E.g., "African Red Slip Hayes form 68".

The Simple Knowledge Organization System (SKOS) is a W3 standard that has some uptake in the real world. dbpedia.org uses the skos:subject property to indicate membership in categories such as those found on Wikipedia. My use is semantically similar.

The newly-started Open Vocab was brought to my attention by Sean Gillies. More precisely, he mentioned it on twitter and since I follow him, I checked it out. OV is a nice staging ground for creating URIs for terms that aren't found in other vocabularies and for terms you just want to think about before choosing an existing standard.

Visual Documentation:
xmlns:vra="http://www.vraweb.org/vracore4.htm#"

Right now I use "vra:imageIs" to indicate that an external file is an image (whether svg or bit-mapped) of an object.

Geography:
xmlns:gml="xmlns:gml=http://www.opengis.net/gml"
xmlns:georss="http://www.georss.org/georss">
xmlns:pleiades="http://pleiades.stoa.org/"
xmlns:batlas="http://atlantides.org/batlas/"

As suggested by S. Gillies, I've qualified some of the geographic markup with "ov:origin".

Authorial/Responsibility Metadata:
xmlns:tei="http://www.tei-c.org/ns/1.0"

The Text Encoding Initiative (TEI) provides a richer set of tools than DC for indicating authorship and related concepts. Its version P5 also provides a complete and elegant standard for encoding digital documents. For now, I'm representing GRBPIlion in xhtml and rdf, because the combination has explicit W3 backing and is suitably lightweight for my purposes.

Data modeling:
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:rdf="xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#""
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:xs="http://www.w3.org/2001/XMLSchema"

No surprises here. I do think there will always be a place for asserting relationships that are not strongly typed by reference to a particular discipline. That and using RDF and Owl for mapping relationships that will be more richly defined at a later date.

What about CIDOC-CRM? I did not find an up-to-date and official looking document that integrates RDF and CIDOC-CRM. I'm also concerned about using a standard whose official release appears only in Microsoft Word and PDF.

I should also explore ArchaeoML as implemented by Open Context but the site seems to be down right now. When I click through to individual databases, no records are being returned. I may be doing something wrong, but if not, I'm sure the site will come back soon.

Again, this is all highly preliminary. Constructive criticism would be very much appreciated.

1 comment:

Fran Sansalone said...

Sebastian -
I'm not sure how germane it is to your work, but I wanted to point out the Museum Computer Network (http://www.mcn.edu/). Koven Smith of the Metropolitan Museum of Art (koven.smith@metmuseum.org) is heading up a special interest group under their auspices; the SIG's focus is on lexicons. We will be working with them to develop a consolidated view of what museums need for lexicon extensions to the Calais service.
Fran Sansalone
Calais Community Manager
www.opencalais.com