Wednesday, December 24, 2008

For Profit Archaeology

The Miliken Institute's Financial Innovations Labs have issued a report entitled Financial Innovations for Developing Archaeological Discovery and Conservation. It seems to be a call for archaeologists to participate in the profit-oriented market for antiquities, though the report certainly doesn't use that language. Even when mediated through the securitization of debt obligations backed by cash-flow from long-term loans, this is problematic. Archaeologists work to bring information about the past to the public, not to meet commercial demand for artifacts.

Friday, December 12, 2008

CIDOC-CRM

Sean Gillies has written an important memo Concordia, Vocabularies, and CIDOC CRM on Concordia's current approach to using the Comité International pour la Documentation des Musées - Conceptual Reference Model (CIDOC-CRM). It should be widely read by people interested in the digital publication of resources for the ancient Mediterranean and beyond. In it he gives a preliminary indication that RDFa - a standard for embedding the Resource Description Framework in html pages - provides a better route forward for the time being. But don't take my word for this, read his whole text.

RDFa has appeared on this blog: PRAP, xhtml 2.0 and Archaeological Databases was early thinking, RDFa at Ilion is more recent, nomisma.org also makes use of RDFa. So Sean's memo is welcome here because his reasoning is similar to mine.

But what of CIDOC-CRM? The main CIDOC-CRM website opens with:
The CIDOC Conceptual Reference Model (CRM) provides definitions and a formal structure for describing the implicit and explicit concepts and relationships used in cultural heritage documentation.
It also notes that the CRM is an ISO standard (ISO 21127:2006). That's a good thing.

In general, the main CIDOC-CRM website doesn't do a good job of introducing itself. If you want a quick feel for how the CRM organizes concepts, try the relevant section of Princeton's QED site. You'll see that the CRM provides a well-thought out vocabulary of concepts for describing cultural heritage. Apart from the odd use of gendered language, it's useful that the CRM defines the concept E24 Physical Man-Made Thing. It will be cool when I can search the Internet for E24's within the Aegean that date to the Late Roman period. I'm guessing the CRM will play a role in enabling such functionality.

In terms of resources linked from the main CIDOC-CRM website, I've paid the most attention to the "mappings" page. I take heart in the work being done in this domain because of the implication that my use of the CRM can be indirect. This is encouraging because current self-representation by CIDOC seems to obscure notions of "best practice" in an over-abundance of detail. See this paper for an example. It is to the CRM's credit that it can represent all the concepts used there, but in many cases one does not have, nor need, this level of detail. I will be happy to use VRA, Dublin core and any other vocabularies and ontologies that gain traction in the Semantic Web world and trust that these will be mapped to the CRM.

From my perspective, that there is not a large amount of CRM-encoded original archaeological data easily available on the internet is an indication that the standard has not seen a high-degree of real world uptake. I understand that there is acceptance of the CRM and many initiatives discussing how it can be used (here) but I would very much like to see actual use with large datasets. I'm also interested in seeing projects that adopt the CRM as the original format for "born digital" data. Will that really happen?

This post represents thinking that I hope will change as we see real world adoption of standards in Cultural Heritage. I'm agnostic as to what the future holds. For the present, I'm all for exploring vocabularies and ontologies that are moving towards RDFa representations.

Wednesday, December 10, 2008

Briefly: two books and a new resource

I am in the Thomas J. Watson Library of the Metropolitan Museum of Art. It's a very pleasant place to work and recommended for archaeologists visiting NYC. They have very strong holdings in Roman pottery.

I've paged and am using A. Camilli. 1999. Ampullae : balsamari ceramici di età ellenistica e romana [worldcat]. That's Italian for "Unguentaria", the common small ceramic bottles/flasks found in many contexts on Mediterranean sites. I stress this because the book is not about Early Christian/Late Roman ampullae associated with pilgrimage. If you're working with unguentaria, you want this book.

Next up is M. Berndt. 2003. Funde aus dem Survey auf der Halbinsel von Milet : (1992 - 1999) : kaiserzeitliche und frühbyzantinische Keramik [worldcat]. This is a very useful catalog for the period it covers. A noteworthy feature is that the 172 plates are on a CD in the back. Putting a CD in the back of a book is an inane long-term solution so I want to go on record here as saying "Don't do it!". And if you do, "Dont use PDF!". But it wouldn't be entirely straightforward of me not to admit that I have the plates on my hard-drive. In the short term, yes, this information is useful. But who is going to have CD readers 20 years from now? Not many of us. And the text isn't available in digital form so here I am checking a few things.

Finally, the American Numismatic Society has initiated a project to establish stable URIs for numismatic concepts and entities. It's at nomisma.org. Take a look but be gentle since it's all in early stages.

Sunday, December 7, 2008

Semantic Web Vocabularies for the Ancient World

As previously indicated, I'm working on an xhtml+rdfa representation of GRBPIlion.

At some point I will give a more general statement of why this is a good idea. Right now, I'm still very much in the planning/modeling phase. In particular, I'm interested in which pre-existing vocabularies I should be using. What follows is a lightly annotated list of potential candidates, some more obvious and stable than others.

General statements of properties and relationships:
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:skos="http://www.w3.org/2004/02/skos/core#"
xmlns:ov="http://open.vocab.org/terms/"

The Dublin Core is a well understood and widely used standard. Where it matches, it's a no-brainer to use it as a default. Currently, each record in the db has "dc:title" as a human readable title. E.g., "African Red Slip Hayes form 68".

The Simple Knowledge Organization System (SKOS) is a W3 standard that has some uptake in the real world. dbpedia.org uses the skos:subject property to indicate membership in categories such as those found on Wikipedia. My use is semantically similar.

The newly-started Open Vocab was brought to my attention by Sean Gillies. More precisely, he mentioned it on twitter and since I follow him, I checked it out. OV is a nice staging ground for creating URIs for terms that aren't found in other vocabularies and for terms you just want to think about before choosing an existing standard.

Visual Documentation:
xmlns:vra="http://www.vraweb.org/vracore4.htm#"

Right now I use "vra:imageIs" to indicate that an external file is an image (whether svg or bit-mapped) of an object.

Geography:
xmlns:gml="xmlns:gml=http://www.opengis.net/gml"
xmlns:georss="http://www.georss.org/georss">
xmlns:pleiades="http://pleiades.stoa.org/"
xmlns:batlas="http://atlantides.org/batlas/"

As suggested by S. Gillies, I've qualified some of the geographic markup with "ov:origin".

Authorial/Responsibility Metadata:
xmlns:tei="http://www.tei-c.org/ns/1.0"

The Text Encoding Initiative (TEI) provides a richer set of tools than DC for indicating authorship and related concepts. Its version P5 also provides a complete and elegant standard for encoding digital documents. For now, I'm representing GRBPIlion in xhtml and rdf, because the combination has explicit W3 backing and is suitably lightweight for my purposes.

Data modeling:
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:rdf="xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#""
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:xs="http://www.w3.org/2001/XMLSchema"

No surprises here. I do think there will always be a place for asserting relationships that are not strongly typed by reference to a particular discipline. That and using RDF and Owl for mapping relationships that will be more richly defined at a later date.

What about CIDOC-CRM? I did not find an up-to-date and official looking document that integrates RDF and CIDOC-CRM. I'm also concerned about using a standard whose official release appears only in Microsoft Word and PDF.

I should also explore ArchaeoML as implemented by Open Context but the site seems to be down right now. When I click through to individual databases, no records are being returned. I may be doing something wrong, but if not, I'm sure the site will come back soon.

Again, this is all highly preliminary. Constructive criticism would be very much appreciated.

Thursday, December 4, 2008

RDFa at Ilion

The following will seem cryptic and I promise to give more detail later...

If anybody is interested in a draft RDFa representation of the GRBPIlion database, then point your parser at http://classics.uc.edu/troy/grbpottery/database.html.

It even uses ov:origin! (sort of)

It's all in pursuit of the four goals given by Tim Berners-Lee in his Linked Data paper.
  1. Use URIs as names for things

  2. Use HTTP URIs so that people can look up those names.

  3. When someone looks up a URI, provide useful information.

  4. Include links to other URIs. so that they can discover
    more things.


Not there yet, but trying.