Tuesday, November 2, 2010

Progress on Museum URIs

I'm including the full text of an e-mail sent by Martin Doerr of the Center for Cultural Informatics on Crete. It's been forwarded to me by a couple of people and there's a call for comment towards the end so it seems to be a public document. That's good because it's an excellent step forward in promoting stable URI's for museum collections. From my perspective, it mostly speaks for itself. Section 7 did cause some concern:
...

Under this consideration, Dominic proposes for the British Museum (http://www.britishmuseum.org/), that all objects of the Museum should be identified on the Semantic Web by the following: http://collection.britishmuseum.org/object/ followed by the "PRN number".

For instance, the Rosetta Stone has the PRN number: YCA62958, hence the "official" URI of the Rosetta stone is: http://collection.britishmuseum.org/object/YCA62958 . This URI should never become direct address of a document.
Just to be clear, if a user cuts-and-pastes 'http://collection.britishmuseum.org/object/YCA62958' into an address bar, or a document links directly to that (which I've just done), that should produce a human readable page. I'd like to see that happen without redirection. If you redirect to that same URL with ".html" appended, then authors will cut-and-paste that string into their documents. If a good non-crufty URI exists, that's what should appear in address bars and that's what should stand as the 'permalink'.

More generally, URIs should promote unity and overlap, not division, between the "semantic web" and the "plain-old web" (POW).

Section 7 also endorses URIs that have a different domain name from the institution itself, e.g. the "collection." in front of "britishmuseum.org". I don't like that. The reason given is to avoid the implications of name changes in the future. Ugh. Institutions should formally endorse the URIs they mint and make them as simple and short as possible. This decision should be taken at the highest levels of the institution. In the BM's case, that may mean the 25-member Board of Trustees.


Finally, the excellent and useful Europeana is mentioned. I'll take this opportunity to note that while http://europeana.eu/portal/record/00401/034BEA5CC6F88ADC6E7DCF5D7C5FECEA8FF85528.html works, http://europeana.eu/portal/record/00401/034BEA5CC6F88ADC6E7DCF5D7C5FECEA8FF85528 doesn't. It should.



Dear colleagues,

I'd like inform you about our discussion today with Dominic Oldman,
Deputy Head of Information Systems, British Museum, his team and
representatives of the Research Space project
(http://sites.google.com/site/rspaceproject/the-team):

1. It is necessary that museum objects are uniquely identified by
suitable URIs in Semantic Web applications.

2. In order to avoid that everybody invents a new URI for the same
object, there should be one authority known to the whole world that
assigns such a URI.

3. This authority is naturally the museum that keeps the object,
because it is the only institution that can verify that two
different use cases of museum object URIs actually describe the same
thing.

4. This URI should be derived in a simple way from the inventory
numbers published in exhibition catalogues, on on-line museum
catalogue access or by asking museum staff, to avoid an error-prone
equivalence matching process.

5. This URI should have a form that enables any museum that wishes
to do so to provide a Linked Open Data service resolving to the
description of that object. Note, that this URI must not be the URL
of an existing document about the object, but it must activate a
standard mechanism prescribed by the Linked Open Data Initiative to
redirect to a document saying what the URI means.

6. This museum object URI will continue be useful for communicating
uniquely about the object, even if the museum never will install an
LoD service, or if the way of dealing with LoD resolution requests
will change.

7. The way to create this URI should be the following: The museum
decides a base URL that will be extended by the inventory number of
the object. The base URL could be within the domain name of the main
museum Website, but in order to stay clear of possible name change
of the latter, a new domain name might be advisable. Also, for
larger museums, resolving LoD access requests to object information
may cause some server load, that can more easily be balanced with a
second name.

Under this consideration, Dominic proposes for the British Museum
(http://www.britishmuseum.org/), that all objects of the Museum
should be identified on the Semantic Web by the following:
http://collection.britishmuseum.org/object/ followed by the "PRN
number".

For instance, the Rosetta Stone has the PRN number: YCA62958, hence
the "official" URI of the Rosetta stone is:
http://collection.britishmuseum.org/object/YCA62958 . This URI
should never become direct address of a document.

It would be good, if Europeana experts to comment, if they regard is
an adequate approach for Europeana, and could transfer this message
to other museums and providers to follow this practice.

I intend to present this on the CIDOC Conference in Shanghai. I
would be very glad if I and Dominic could get a response within the
next week, if you endorse the procedure, and if you will support us
to spread the practice.

If you need further clarifications, please let me know as soon as
possible.

Best wishes,

Martin
--

--------------------------------------------------------------
Dr. Martin Doerr | Vox:+30(2810)391625 |
Research Director | Fax:+30(2810)391638 |
| Email: martin@ics.forth.gr |
|
Center for Cultural Informatics |
Information Systems Laboratory |
Institute of Computer Science |
Foundation for Research and Technology - Hellas (FORTH) |
|
Vassilika Vouton,P.O.Box1385,GR71110 Heraklion,Crete,Greece |
|
Web-site: http://www.ics.forth.gr/isl |
--------------------------------------------------------------

4 comments:

leifuss said...

Hi Sebastian

Great post and an interesting development. Only thing I'd differ on is the issue of having separate domains which I don't really see as a big problem. In the case of the BM, collections.bm.org and www.bm.org are just subdomains of bm.org so it remains clear that it's the same organisation publishing them.

I like the idea of publishing URI guidelines for museums. The key thing now will be explaining the incentives for using them rather than attempting to impose them from above.

Mia said...

The Museums Computer Group has had lots of discussion of this in the past, IIRC - getting agreement that URIs are something worth doing might be easier these days, but guidelines would also be useful.

I'd like to see the suggestions tested against the requirements of a few other museums before converting them into 'guidelines'. For example Science Museum/NMSI accession numbers contain '/' (used for parts of a whole) - I've never known what, if any, problems that might cause when building URIs.

Mia said...

In case it's useful, here's a link to some previously collected links on other discussions in the sector: http://museum-api.pbworks.com/w/page/Permanent-IDs

Robert Huber said...

This reminds me somehow on the introduction of LSID (life science identifies) by TDWG and GBIF some years ago. It is always a tricky thing to transfer digital concepts to real objects.
Anyway in addition to URIs it might be a good idea to evaluate the rest of digital identifiers e.g. doi etc. Or even PURL (persisten URLs). I personally am thinking about introducing PURLs for CollectConcept