Wednesday, November 3, 2010

Responses to "Progress on Museum URIs"

Three people responded to yesterday's post on museum URIs.

Leif Isaksen left a comment to the effect that he's not too concerned about differing base URIs for museum collections. I agree that there are worse things than the string "collection." in "". The original explanation was to reduce load on an individual server. Without meaning to get too technical, the "/object" can be an effective load reducer by passing requests to a proxy. Bottom line: in an ideal world, I'd drop the "collection.", but I'm not too worked up about it.

Eric Kansa responded on his blog. His point had an interesting overlap with an e-mail I received. I won't quote that in its entirety as the author could have made it public if s/he wanted to. Here's a snippet:
but to me it seems a very bad idea to think that only museums can claim the right to designate URIs for their objects; there should be a standard that can be used by museums as well as by scientists outside of museums...
I took this as responding largely to
2. In order to avoid that everybody invents a new URI for the same
object, there should be one authority known to the whole world that
assigns such a URI.

3. This authority is naturally the museum that keeps the object,
because it is the only institution that can verify that two
different use cases of museum object URIs actually describe the same
Taking Eric's and Anonymous' comments together, I read them as calling for a multi-vocal internet in which many agents can assert an identity for an object, with those identities together forming a distributed and diverse commentary on the human past. I totally agree. To be self-critical, I may well have mis-read M. Doerr's e-mail. If he's calling for recognition of the exclusive right of museums to identify their objects, that's a non-starter. It's neither the right thing to do nor is it possible. On first reading, I took his e-mail to represent a welcome assumption of responsibility by museums to provide a locus of stability for reference to their collections. But to be clear, objects will have multiple identifiers. Referring back to a common identifier promoted by and discoverable at the holding institution will ease the process of recognizing that two or more identifiers refer to the "same thing". That will itself promote the idea of a discoverable and multi-vocal discussion about the past.

Tuesday, November 2, 2010

Progress on Museum URIs

I'm including the full text of an e-mail sent by Martin Doerr of the Center for Cultural Informatics on Crete. It's been forwarded to me by a couple of people and there's a call for comment towards the end so it seems to be a public document. That's good because it's an excellent step forward in promoting stable URI's for museum collections. From my perspective, it mostly speaks for itself. Section 7 did cause some concern:

Under this consideration, Dominic proposes for the British Museum (, that all objects of the Museum should be identified on the Semantic Web by the following: followed by the "PRN number".

For instance, the Rosetta Stone has the PRN number: YCA62958, hence the "official" URI of the Rosetta stone is: . This URI should never become direct address of a document.
Just to be clear, if a user cuts-and-pastes '' into an address bar, or a document links directly to that (which I've just done), that should produce a human readable page. I'd like to see that happen without redirection. If you redirect to that same URL with ".html" appended, then authors will cut-and-paste that string into their documents. If a good non-crufty URI exists, that's what should appear in address bars and that's what should stand as the 'permalink'.

More generally, URIs should promote unity and overlap, not division, between the "semantic web" and the "plain-old web" (POW).

Section 7 also endorses URIs that have a different domain name from the institution itself, e.g. the "collection." in front of "". I don't like that. The reason given is to avoid the implications of name changes in the future. Ugh. Institutions should formally endorse the URIs they mint and make them as simple and short as possible. This decision should be taken at the highest levels of the institution. In the BM's case, that may mean the 25-member Board of Trustees.

Finally, the excellent and useful Europeana is mentioned. I'll take this opportunity to note that while works, doesn't. It should.

Dear colleagues,

I'd like inform you about our discussion today with Dominic Oldman,
Deputy Head of Information Systems, British Museum, his team and
representatives of the Research Space project

1. It is necessary that museum objects are uniquely identified by
suitable URIs in Semantic Web applications.

2. In order to avoid that everybody invents a new URI for the same
object, there should be one authority known to the whole world that
assigns such a URI.

3. This authority is naturally the museum that keeps the object,
because it is the only institution that can verify that two
different use cases of museum object URIs actually describe the same

4. This URI should be derived in a simple way from the inventory
numbers published in exhibition catalogues, on on-line museum
catalogue access or by asking museum staff, to avoid an error-prone
equivalence matching process.

5. This URI should have a form that enables any museum that wishes
to do so to provide a Linked Open Data service resolving to the
description of that object. Note, that this URI must not be the URL
of an existing document about the object, but it must activate a
standard mechanism prescribed by the Linked Open Data Initiative to
redirect to a document saying what the URI means.

6. This museum object URI will continue be useful for communicating
uniquely about the object, even if the museum never will install an
LoD service, or if the way of dealing with LoD resolution requests
will change.

7. The way to create this URI should be the following: The museum
decides a base URL that will be extended by the inventory number of
the object. The base URL could be within the domain name of the main
museum Website, but in order to stay clear of possible name change
of the latter, a new domain name might be advisable. Also, for
larger museums, resolving LoD access requests to object information
may cause some server load, that can more easily be balanced with a
second name.

Under this consideration, Dominic proposes for the British Museum
(, that all objects of the Museum
should be identified on the Semantic Web by the following: followed by the "PRN

For instance, the Rosetta Stone has the PRN number: YCA62958, hence
the "official" URI of the Rosetta stone is: . This URI
should never become direct address of a document.

It would be good, if Europeana experts to comment, if they regard is
an adequate approach for Europeana, and could transfer this message
to other museums and providers to follow this practice.

I intend to present this on the CIDOC Conference in Shanghai. I
would be very glad if I and Dominic could get a response within the
next week, if you endorse the procedure, and if you will support us
to spread the practice.

If you need further clarifications, please let me know as soon as

Best wishes,


Dr. Martin Doerr | Vox:+30(2810)391625 |
Research Director | Fax:+30(2810)391638 |
| Email: |
Center for Cultural Informatics |
Information Systems Laboratory |
Institute of Computer Science |
Foundation for Research and Technology - Hellas (FORTH) |
Vassilika Vouton,P.O.Box1385,GR71110 Heraklion,Crete,Greece |
Web-site: |

Thursday, October 28, 2010

Ancient Mediterranean Objects at the NMHN

Using to track URIs. Here's an Ancient Mediterranean object at the National Museum of Natural History.

If you're reading this at , that's part of the experiment as well.

Saturday, October 16, 2010

Change Happens (if it can)

As the result of an e-mail exchange with Neel Smith, one of the designers of the Canonical Text Service Protocol, I've come up with the following formulation:
If a character in a URL can change, it will.
I'm not the only person to think this but I just wanted to get that thought out in simple, direct language.

But what do I mean? Take Worldcat URLs such as That "www." is annoying and should not be part of the URL that Worldcat presents as its permanent identifier for the book. At some point in the future, somebody there will realize this and remove those unnecessary characters. But Now you're talking! And look, it already works.

It's true that the "oclc" could be shortened so maybe I need to qualify the formulation, but I'm not going to for the following reason. Changing those characters would risk collision with other identifying schemes that Worldcat supports such as . The 'www.' is unstable because it can be removed without breaking anything.

The simple formulation stands: If a character in a URL can change, it will.

The implication is, "be aggressive about removing all unnecessary characters from your URLs." The following is a horror-show:
It just looks unstable. Leading me to another formulation:
If a URL looks unstable, it is.

Tuesday, September 21, 2010

Discussing Citation by Example

I've started a set of pages at the Digital Classicist Wiki on the topic of Citation in digital scholarship. In progress, under construction, etc., etc., etc.

The goal is to move existing practice towards a broad understanding of how to make citations to such categories of evidence as primary written sources, geographic entities, cataloged objects, and secondary scholarship so that those citations are:
  • Clearly identified in a robust yet rich fashion
  • Recognizable by automatic agents
  • To resources that are stable over the long-term

But I don't think it will be possible to establish and drive adoption of one very detailed standard. Better to have a simple notation - I follow others in suggesting 'class="citation"' for (x)html - that can indicate the presence of more detailed markup. I'm a fan of RDFa so I further discuss that on the page "Citations with added RDFa.

The Digital Classicist community is pretty open and I'm very grateful to G. Bodard (a.k.a palaeofuturist) for saying the equivalent of "Go for it." when I raised the possibility of hosting these materials in his realm.

There's a category for all the pages and I hope that list will grow.

Wednesday, September 8, 2010

References that just work (but I understand it's not that simple...)

Go to Google. Type in "John 20:24", then hit return. You can even click the "I'm Feeling Lucky" button. Or here's a direct link.

Or try the same thing in Bing (which provides results for Yahoo), and Altavista.

As you'll see, all three searches get you to the relevant passage of the Gospel According to John. And if you poke around on the Biblegateway site, you'll see various translations (but where's the Vulgate?).

That's impressive. It indicates that human readable references can be become so stable that automated agents are able to correctly translate them into links to particular chunks of primary text.

Here are some variations on the theme (all in Google):
"jean 20:24" (at Not spot on, but pretty close.
"1 John 2:1": That's a reference to the first epistle of John. Entered into the "address bar" in Chrome. Seems to work.
"John 3": Unqualified chapter reference. Good to go.
"ephesians 1:2": Works.
"eph. 1:2": That abbreviation is OK.
"eph 1.2": Things become fuzzier if I don't use the ':' that is conventional in references to Christian scripture.
"Ephesians 2:4-10": Spans work as well, when properly formatted.

Again, I think this is interesting. Taking the New Testament as a corpus of Ancient Mediterranean texts that were written between the mid-first and third (at the latest: the Epistle of James 1 isn't definitively quoted until Origen) centuries AD makes it relevant to the study of the Ancient World as a whole. As a corpus, it's been around for a long time. Athanasius's letter of AD 367 is one conventional date for the determination of what was in, and what was out.

Those comments aside, the point remains that it is possible to automatically reverse engineer the citation scheme of a very stable corpus. I guess one caveat is that I don't absolutely know that Google, Bing, etc. haven't special cased strings that are plausibly references to the NT. Any ideas?

My larger goal is to think about references to so-called "primary texts" that just work. Given the above, my ad hoc, working definition of "primary text" is any text with a sufficiently stable name and citation scheme that search engines can find it. Sure, that's circular and incomplete, but it will do for now.

Let's try some others:
"gilgamesh 3": Muddled.
"gilgamesh tablet 3": Better.
"Iliad 23": Not bad. No Greek.
"Iliad 23.100": Individual line references don't work.
"Homer Iliad 23.100": Not better.
"Quran 32": I see it as the third link.
"hemingway, the old man and the sea": For comparison. Wikipedia is the top page for me; that's not the text itself. And Amazon is up there, as in the work is in copyright so I'd have to pay. Not sure I want to follow the links that say I can download the text for free.

A major distinction between references to NT texts and the second group is the ability of Google to handle full chapter and verse ('n:n') references. That doesn't seem to work for the Iliad. That's worth exploring.

If I go to Perseus and use the search box at the upper right, "homer iliad 23.100" doesn't work directly. Nor does "iliad 23.100". But "Hom. Il. 23.100" does. If I try that string in Google (link), it gets me to the Chicago version of the Perseus texts (via the 2nd ranked link when I tried it.). [I'll take this opportunity to note that the Chicago Perseus is wicked, and that it's likewise wicked cool that Perseus texts are licensed so this redundancy is possible.]

That kind of variation is one of the reasons I parenthetically qualified the title of this post. References to "primary texts" - and other texts for that matter - are not simple. In this post - as is often my wont - I've let myself be drawn along by current practice. I really do like to see what people are actually doing and how data actually works on the Internet. If you want a more substantive discussion of the problems of citation, I highly recommend Neel Smith's "Citation in Classical Studies" in DHQ 2009. Here's the abstract:
Citation practice reflects a model of a scholarly domain. This paper first considers traditional citation practice in the humanities as a description of our subjects of study. It then describes work at the Center for Hellenic Studies on an architecture for digital scholarship that is explicitly based on this model, and proposes a machine-actionable but technologically independent notation for citing texts, the Canonical Text Services URN.
For now, let me say that it is correct for Google (via Biblegateway) to dereference a citation to John 7:53-8:11 (the Pericope Adulterae) or John 5:7 (the Comma Johanneum). Neither may have been in the "original" text of the Gospel of John, but references to them are semantically clear and have been used "in the wild" so need to be handled. But note that Google prioritizes discussion of the CJ over the text (or at least does when I'm trying it now). Again, see N. Smith on the implications of such variation.

Clearly it helps to have a committed body of believers and/or scholars working on very old texts. Energy and time make for stable references. But there is variability in functionality even within that group. I guess the long-term question is how do we move more texts into the category of "just working"? I am assuming we want to. And how do we support co-existence of the simple "reference following" alongside what Neel describes. Both are useful.

Monday, August 30, 2010

Numbered Paragraphs in Digital Humanities Quarterly

I can recommend Patrik Svensson's article "The Landscape of Digital Humanities" in Digital Humanities Quarterly as a good read. My comments here are about the internals of handing DHQ's paragraph based citation scheme.

Quick intro to the issue: DHQ is an online journal. It doesn't have pages to provide a physical solution to the need to make references to specific points in an article. So the html version numbers each paragraph. So far so good. As a reader I can note the paragraph number and cite it in a future publication.

But I'm not sure DHQ has quite the right implementation of this good idea. I'm arbitrarily picking the paragraph numbered 118. The one that starts, "Information technology, or more broadly the digital, can be seen as affording objects of analysis for the humanities."

Note that I don't include a link directly to that paragraph. That's because I can't. Looking at the HTML source, I see:
<div class="counter">118</div><div class="ptext">Information technology, or more broadly the digital, can be seen as affording...

That's somewhat unfortunate. It would be great if the '<div class="ptext">' were changed to read '<div id="p118" class="ptext">. Then I could mint a URL of the form:

It would be even cooler if the <div class="counter">118</div> also read:
<div class="counter"><a href="#p118">118</a></div>

I've wrapped the paragraph number in a link to the paragraph. That way a user can right/control-click on the link and copy-and-paste it into an e-mail or other work. Easy self-reference to an internal citation structure.

I'd also like to see the paragraph numbers represented in the XML source. Again, taking a snippet of that, the start of the paragraph numbered as 118 in the html, appears in the xml as:
<p>Information technology, or more broadly the digital, can be seen as affording objects of analysis for the humanities...

Unless I'm missing something, the published citation scheme isn't represented in the archival version. I think it should be. Even if DHQ considers the paragraph number ephemeral, I think there's a valid scholarly need for them to be persistent.

I'm a big fan of DHQ so this is constructive criticism. And I'm sort of hoping that I've mis-understood something and that those paragraph numbers are more meaningful than they seem after one looks under the hood.

Tuesday, August 24, 2010

Corrected Versions of Papers

It's late August so my mind is on other things, like the next stretch of split-rail fence that I need to put in. But I do find it interesting that Heather Baker has used to distribute a corrected version of her paper "The layout of the ziggurat temple at Babylon" that first appeared in Nouvelles Assyriologiques Brèves et Utilitaires 2008.2 (Juin). Feel free to be similarly and vaguely inspired about issues of versioning, "scribal error", reference, etc.

Thursday, June 17, 2010

More Papers on

Sure, is far from perfect. But I continue to be psyched when I see people who have uploaded a bunch of papers or a book.Just a brief "thanks" to those who have added to my collection of digital offprints.

Thursday, June 10, 2010

When I first heard about, I pretty much ignored it. Basically, the last thing I needed was another social media site.

A few weeks ago, however, Chuck Jones sent me an invitation so I took another look. What caught my eye was the papers that users have uploaded. I'm always on the look out for digital content that I can't find elsewhere so the site's role as a central point for the discovery of articles, etc. is very useful.

Here's an incomplete list of some of my fellow "Academicians" whose scholarship I've downloaded:I've tried to do my bit as well by pointing to some of my work that's available on-line. And here are links to the Institute for the Study of the Ancient World and American Numismatic Society pages. I'm intrigued by the opportunity to have a lightweight institutional repository that offers to an organization like the ANS.

Not everything is perfect about the site. The 'Department Viewer' - for want of a better name - relies on Flash. That's lame and doesn't work on an iPad. And I'm not sure the developers have realized that the papers are a real asset. It would be nice to be able to see all papers listed by people I'm following. Or all papers from a single department. And more liberal spreading around of "Follow this Person" buttons would be nice. If I'm looking at the page for a particular paper, I should be able to single-click to follow its author.

If you're interested, come join the fun. And remember to link to your digital scholarship. As you can tell, I think that's the real utility of the site.

Coin Hoards, Timelines and KML is a KML file that shows findspots of hoards with coins of Thasos in them. You can see that file rendered with the Google Map API at

This post is about viewing the KML file in Google Earth. If you do that, you'll see a Timeline slider appear in the top left of the G Earth window. Slide the control to the right and you'll see an explosion of hoards towards the north following the mid-2nd century BC. It's really quite dramatic so give it a shot.

One part of an explanation is that Thasos started striking large numbers of new larger tetradrachms following 148 BC, with many of these traveling north. Exactly why is a matter of historical interpretation. The Roman province of Macedon was established in 146 B.C. and that had a profound effect on both the issuance and circulation of coinage. is about making this information more accessible so that more scholars can engage with the question.

Wednesday, June 2, 2010

Bibliographic Tools, Citations, and Digital Publications

Some preliminaries... I'm posting about digital publication of ancient world scholarship as part of my work at NYU's Institute for the Study of the Ancient World. I say that only to make clear that there's a practical element to my thinking about issues of citation, structure, metadata, etc. I will be helping to shepherd content into the digital realm and that means decisions, decisions, decisions. I am enjoying the focus this context gives to my thoughts. And since I, like most bloggers, live for little nuggets of feedback, those have been appreciated as well.

It's also important to stress that this is all happening within the intellectual context of ISAW. In other words, my new colleagues have pushed on these issues in interesting ways and I can take advantage of their previous work.

As in, let's talk about Zotero and the role it can play in providing a sustainable bibliographic framework for digital scholarship. This is already happening at the ISAW-hosted Pleiades project, and that lets me take a very practical approach to writing about the issue.

The Pleiades Zotero library is at It includes the item, which is the Zotero record for the article
Coastal Sites of Northeast Africa: The Case Against Bronze Age Ports
. In case it isn't clear, the point of the article is pretty much that there weren't any.

In yesterday's post, I talked about citing secondary scholarship. Today, I'm interested in the mediating those citations through Zotero bibliographic records.

The same basic pattern would apply: '<a rel="dc:references" href="">White and White 1996</a>' is a reference to the work described in that Zotero record. I am interested in the extent to which it is necessary to indicate that it is not a reference to the Zotero record itself but let me put that off for now. More relevant here is why use Zotero to establish unique identities for cited works at all?

The most compelling reason is that not all works will have such identifiers and Zotero allows one to create these. For example, is the Zotero record for an article that has no direct representation in Worldcat and which isn't online (I don't think). I.e., you're on your own in terms of a stable URI for this title.

And since consistency is good, it might be appropriate to create Zotero records for all cited works in a digital publication and only point to those.

This approach is also attractive because it allows linking to digital representations of titles as they become available. For example, the record for Coastal Sites could be linked to, which is the JSTOR record. A more compelling example is this link: That will take you to the Atypon-Link version of J. Cherry and W. Parkinson's Hesperia 2010 article on lithics from SW Greece. As the volumes of Hesperia role over into JSTOR once they are past the 3 year wall, the Atypon URI will be either matched or replaced by an equivalent JSTOR URI. A Zotero record can have links to both versions without requiring any updating of the digital publication which points to that record.

And if more than one digital publication points to a Zotero record, that equivalency should be discoverable. I like that.

The big potential downer is: do we trust Zotero to be around for the foreseeable future? Or at least, will these URI's work over the very long term? I don't know the answer to that. This is one reason to ensure that each digital publication "knows" bibliographic metadata for all the citations it makes. Centuries from now, that information may be useful in tracking down readable versions of titles.

And here'a a finishing twist. Regardless of which tool is used to generate URI-based unique identifiers for cited works, that same tool could (should? must?) be used to provide URI-based unique identifiers for the digital publication itself.

Tuesday, June 1, 2010

References in Digital Publications

Modern scholarship relies on citation. It's efficient in that one work can incorporate the results of another without having to repeat it. It's also a requirement of our modern academic culture that if you use somebody's idea, you give that person credit. There's more to be said on both points but this post is more about mechanics than purpose. (Though see here for a recent discussion of purpose. [I fall into the camp of : if you want credit for your work, make it easy to identify and be generous in giving credit to others. If you don't need credit, that's OK but still give it.]).

Back to references. They come in many forms in print works. In pre-linked media, among the purposes of citation is to give future readers the information they need to physically acquire the referenced work. That is, you take the title of the book or journal, go to the library to find the volume, and then start reading.

It is one of the great glories of the Internet that this physical labor is no longer always necessary. The simple construct '<a href="">I wrote this</a>' is rendered as 'I wrote this', so that a mere click takes you directly to the article.

That form of link is too simple to support modern scholarly practice. Citations of the form (Heath 2010) give a preliminary indication to the reader of who wrote a referenced work. Full information in footnotes further enriches the reading experience, but at the cost of possibly interrupting the flow of an argument, or depriving the reader of a collected bibliography at the end of a work. Choose your own preference, that's not my point here.

Instead, I am exploring specific patterns of markup that promote access to referenced works while also recording bibliographic metadata in a robust and sustainable fashion. Two needs, two solutions.

Here's some markup: Late Roman pottery is very visible in Aegean landscapes (<a rel="dcterms:references" href="">Pettegrew 2007</a>).

If we momentarily ignore the question of whether or not Handles records are good stable URIs for bibliographic resources, the semantics of this html are clear: it represents a citation of the 2007 article by David Pettegrew, The Busy Countryside of Late Roman Corinth. (Note: it doesn't reference the html page describing that title)

The use of the term "dcterms:references" in the RDFa rel attribute follows from the Dublin Core's Guidelines for Encoding Bibliographic Citation Information in Dublin Core Metadata. In this context 'references' is a verb, not a plural noun.

That html will render as: "Late Roman pottery is very visible in Aegean landscapes (Pettegrew 2007)." Again, this is all pretty clear.

It's also worth noting that the 'a' element in html is a building-block of our search-engine enabled world. Scholarship should not fight that, but use it. As many have said, "you get this for free."

I do, however, want to pair this reference with bibliographic metadata. Here's where some more RDFa comes in.

'' is a unique identifier for Pettegrew's article. This suggests the following snippet: <div about=""><span property="dcterms:bibliographicCitation">Pettegrew, D. (2007). "The Busy Countryside of Late Roman Corinth: Interpreting Ceramic Data Produced by Regional Archaeological Surveys" In <i>Hesperia</i> 76.4: 743-784.</span></div>

These two snippets can be adapted and combined with a little more RDFa scaffolding:
<html xmlns=""
xmlns:dcterms="" >
<body about="">
<h1>My Text</h1>
<p>Late Roman pottery is very visible in Aegean landscapes (<a rel="dcterms:references" href="">Pettegrew 2007</a></p>
<p about="" property="dcterms:bibliographicCitation">Pettegrew, D. (2007). "The Busy Countryside of Late Roman Corinth: Interpreting Ceramic Data Produced by Regional Archaeological Surveys" In <i>Hesperia</i> 76.4: 743-784.</p>

Pointing an RDFa extractor at that html gives:
@prefix rdf: <> .
@prefix : <> .
@prefix dcterms: <> .

   dcterms:references <> .

   dcterms:bibliographicCitation "Pettegrew, D. (2007). \"The Busy Countryside of Late Roman Corinth: Interpreting Ceramic Data Produced by Regional Archaeological Surveys\" In <i xmlns=\"\" xmlns:dcterms=\"\">Hesperia</i> 76.4: 743-784."^^rdf:XMLLiteral .

The shorter version of which is: references Pettegrew 2007 and even knows something about it. There are lots of third-party tools that can find this information when it is encoded in this way. And I could enrich the 'bibliographicCitation' to include parsable information on author, title, date, etc. That's for another time.

I want to stress that I don't think this determines a particular citation style. Use footnotes if that's preferable. As long as the RDFa produces triples similar to the above, your information is useful. And some degree of run-time transformation is also possible, depending on the granularity of the markup.

Tuesday, May 25, 2010


Briefly... I'm sitting in an office at NYU's Institute for the Study of the Ancient World, where I am now a Visiting Scholar. This is preliminary to a more permanent position with details to come.

My main goal is to work on issues of digital publication and on integration of diverse digital resources. I had started collaborating with ISAW-folk on these issues some time back, which is why I've been blogging about them.

I'm extremely excited to be working with my new colleagues here - a veritable dream-team of digital humanists - and am looking forward to making real progress when it comes to sharing well-structured, semantically-rich, open-licensed scholarship about the Ancient World.

And I'll still be collaborating with my long-time friends at the ANS, particularly on And field-work goes on.

Back to work...

Wednesday, May 19, 2010

RDFa Document Metadata: Authors in PLOS One

Brief follow up to yesterday's post.

Here's the HTML that indicates authorship from an example PLOS One article.
<p xmlns:xs="" xmlns:xlink="" xmlns:mml="" xmlns:aml="" class="authors" xpathlocation="noSelect">
<span rel="dc:creator"><span property="foaf:name">Harold C. Sox</span></span><sup><a href="#aff1">1</a></sup>, <span rel="dc:creator"><span property="foaf:name">Mark Helfand</span></span><sup><a href="#aff2">2</a></sup><sup><a href="#cor1" class="fnoteref">*</a></sup>,
<span rel="dc:creator"><span property="foaf:name">Jeremy Grimshaw</span></span><sup><a href="#aff3">3</a></sup>,
<span rel="dc:creator"><span property="foaf:name">Kay Dickersin</span></span><sup><a href="#aff4">4</a></sup>, <span class="capture-id">the <i>PLoS Medicine</i> Editors</span>,
<span rel="dc:creator"><span property="foaf:name">David Tovey</span></span><sup><a href="#aff5">5</a></sup>, <span rel="dc:creator"><span property="foaf:name">J. André Knottnerus</span></span><sup><a href="#aff6">6</a></sup>,
<span rel="dc:creator"><span property="foaf:name">Peter Tugwell</span></span><sup><a href="#aff7">7</a></sup>
<p xmlns:xs="" xmlns:xlink="" xmlns:mml="" xmlns:aml="" class="affiliations" xpathlocation="noSelect">
<a name="aff1" id="aff1"></a><strong>1</strong> Dartmouth Institute, Dartmouth Medical School, Hanover, New Hampshire, United States of America,
<a name="aff2" id="aff2"></a><strong>2</strong> Portland VA Medical Center and Department of Medicine, Oregon Health &amp; Science University, Portland, Oregon, United States of America,
<a name="aff3" id="aff3"></a><strong>3</strong> Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, Ontario, Canada,
<a name="aff4" id="aff4"></a><strong>4</strong> Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, United States of America,
<a name="aff5" id="aff5"></a><strong>5</strong> The Cochrane Library, London, United Kingdom, <a name="aff6" id="aff6"></a>
<strong>6</strong> Department of General Practice, University of Maastricht, Maastricht, The Netherlands,
<a name="aff7" id="aff7"></a><strong>7</strong> Departments of Medicine, and Epidemiology and Community Medicine, University of Ottawa, Ottawa, Ontario, Canada

The basic structure is two 'p' elements, one with a 'class="authors"', the second with 'class="affiliations"'. I am trying to avoid using @class to indicate document structure and metadata, so yesterday I adopted the 'bibo:authorList' convention. But it is useful to see another instance of the nested 'rel="dc:creator"'->'property="foaf:*"' pattern. Is that beginning to look like a trend?

The relationship between author and affiliation is a little broken. The reference from each author to his/her affiliation is actually to an 'a' element with no content. An automatic agent might return an empty string as the affiliation unless it had ad hoc code to pull the text as far as the next '<a>' or '</p>' tag. That's not particularly helpful.

It is important to be clear that this HTML is rendered from XML encoded in the National Institutes of Health's Journal Publishing Tag Set Version 2.0. That's my way of acknowledging that the markup delivered to your browser doesn't bear the full weight of being a well-structured archival version.

Tuesday, May 18, 2010

Towards a metadata header for XHTML5+RDFa1.1 Digital Publications

XHTML5 defines elements such as 'header' and 'summary' that improve the constructs for indicating document metadata. But it is not a finished solution for embedding these concepts in a born-digital scholarly publication. In this post I take an initial crack at a decent way of doing this.

To cut to the chase, here's a sample document:
<!DOCTYPE html>
<html xmlns=""
xmlns:owl="http ://"

<title property="dc:title">Guidelines for Using XHTML5 to encode Digital Publications</title>
<base href=""/>
<div rel="bibo:authorList">
<ul rel="rdf:Seq">
<li rel="rdf:li">
By <span rel="dc:creator">
<span rel="foaf:Person">
<span property="foaf:name" rel="owl:sameAs" resource="">Albert Gallatin</span>

<li rel="rdf:li">
and <span rel="dc:creator">
<span rel="foaf:Person">
<span property="foaf:name" rel="owl:sameAs" resource="">William Alexander Hammond</span>
<summary property="dc:description" xml:lang="en">An abstract in English.</summary>
<summary property="dc:description" xml:lang="fr">Un résumé en Française.</summary>
<h1>Section 1</h1>
<p>Your text here.</p>
And here's the turtle representation of the embedded RDF:
@prefix rdf: <> .
@prefix : <> .
@prefix bibo: <> .
@prefix dc: <> .
@prefix dctypes: <> .
@prefix foaf: <> .
@prefix owl: <http ://> .

dc:description "An abstract in English."@en, "Un résumé en Française."@fr ;
dc:title "Guidelines for Using XHTML5 to encode Digital Publications" ;
bibo:authorList [
rdf:Seq _:bnode1
] ;
a dctypes:Text .

rdf:li [
dc:creator [
foaf:Person [
owl:sameAs <> ;
foaf:name "Albert Gallatin"
], [
dc:creator [
foaf:Person [
owl:sameAs <> ;
foaf:name "William Alexander Hammond"
] .

Even if you don't read turtle that might make some sense.

  • <> after the prefixes means we're defining attributes of a document at that URI.
  • The last line of this indented section just says that the document is 'a dctypes:Text' resource.
  • dc:description "...." means the RDF extractor has found the Dublin Core description (more or less used as 'abstract'). The abstract is available in two languages as indicated by the 'xml:lang' attribute in the document.
  • Same for dc:title. Note that I don't put the title in the 'body' element because html-family encoding schemes want it up in the 'head'. Perhaps the title should be repeated in the 'body/header'. I'm inclined to think that can be done on delivery to a browser when a document is published via a web-server. The archival version should not have such repetition.
  • We then come to a 'bibo:authorList', the contents of which are specified following the line beginning '_:bnode1'. The "Bibliographic Ontology" (here 'bibo') uses this construct for multi-authored works. I'm not sure I like it. Especially since it imposes the extra nesting of rdf:Seq and rdf:li. But if 'bibo' is widely adopted (which it sort of is) then it's not my place to complain. Conform to the standard and move on. The contents of each rdf:li in a bibo:authorList are not well defined in the spec. I looked through the bibo examples, adopted its use of dc:creator and foaf:Person, and then added an owl:sameAs for good measure.
My point in doing all this is to make use of existing standards that allow a corpus of born-digital scholarship to represent metadata in a machine-recognizable fashion that also allows the "text parts" to be human readable. I'm just at the beginning of this project so I welcome suggestions of where I can look for good models.

Wednesday, May 12, 2010

New KML files for hoards and mints on is the project I work on with my colleagues at the ANS and elsewhere to establish stable URIs for numismatic concepts. Development sometimes moves slowly but I've recently added new functionality for the mapping side of things. I'm facing one annoying bug but I think it's worth reporting this progress. So...

  1. will bring up the html page for the mint of Eretria in Greece. You'll see a very brief label for the site, co-ordinates, and a link to the relevant Wikipedia article.
  2. is a kml file that just shows the location of the mint.
  3. is much more fun. It shows the location of the mint plus all the mappable hoards that have Eretrian coins in them. 'Mappable' is just an indication that we haven't entered findspots for all hoards. But we're moving as fast as we can.

This pattern is generalized. and do what you would expect. shows just the findspot of the hoard. shows findspot and location of the mints of the coins found in the hoard.

Open these files in Google Earth for best effect.

There are links to the related kml files on each page and I've also put <link> elements in each page's head (cf. S. Gillies' blog post for discussion).

The annoying bug is that when I show those maps on the site using the Google Maps API, not all mints or findspots appear. Not sure why that is, but I'm guessing I've got something incorrectly formatted. Or there is some limit in how many Network Links the Maps API will load in a short period of time. I'll investigate and fix.

More concisely, will show you a mint and findspots for its coins. As noted, not all information is entered; but we can begin to talk about the site and its data as a resource for mapping economic connections within the Ancient Mediterranean and Near East.

Tuesday, May 11, 2010

Document and Concept: '#this' and how DBpedia does it

I'm following up on yesterday's post in which I looked at the distinction between 'concept' and 'document' as well as its implications for scholarly practice. To be honest, I'm not sure I've really addressed the scholarly practice aspect of this thread but that's where I'm heading. I'll give a preview at the very end of this post.

Yesterday I asked, "Is there an unambiguous and widely-accepted convention for indicating the concept lying behind a document?". Gabriel Bodard left a comment noting the convention of appending '#this' to indicate that a URI is a reference to the real-world concept rather than the document describing that concept. This is definitely worth considering.

As an aside, Gabby (if I may) is correct that it's hard to look for documentation of the convention since 'this' is understandably ignored by search engines. There's the W3 document 'Cool URIs for the Semantic Web', which does discuss '#this'. I'm not sure if that's the original citation but that title is definitely on the suggested reading list for this topic. As is 'Linked Data Tutorial - NG: Publishing and consuming linked data with RDFa', which I was reminded to look at anew by Sean Gillies.

I have reservations about '#this'. Some of them are aesthetic but that's not a strong leg to stand on. Practically, I don't like having to inspect the internal characters of a URI to figure out its semantics. I also wonder if the convention hasn't really taken off. The 'Linked Data Tutorial' was published after 'Cool URIs' so it may be indicative that it doesn't discuss '#this'. I'm also not sure it's good to devote the '#' mechanism (aka fragment identifiers) to represent metadata rather than maintaining its original purpose of specifying internal portions of a document. But if '#this' comes to rule the world, I'll happily use it.

The 'Linked Data Tutorial' does use DBpedia in its examples so I want to look more closely at how that site handles the 'Document/Concept' distinction. In truth, I didn't find an explicit discussion of the topic on the DBpedia site itself. Maybe I just didn't come across it so I'd welcome a link. I did find the following on the the OpenLink site: "the URI prefixes, http/ and distinguish between a resource and its HTML or RDF description documents". OpenLink is the creator of Virtuoso, the software that powers DBpedia's SPARQL-endpoint, so I'll take that statement as definitive until I find something more authoritative.

Time to get into details... is the URI for the concept 'Antioch: the ancient city'. Clicking on that URI will cause your browser to be redirected to the document . That's great. We have a clean separation between concept and document.

Looking at the source of 'page/Antioch' (I'll use that shorthand going forward) shows that this document uses RDFa to embed semantic information in human-readable html. We could switch that around. RDFa allows human-readable text to be embedded in machine-parsable data. I'm not sure it matters, which is the main point.

DBpedia even references the RDFa 1.0 DTD: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "">. That's very cool and very correct. When RDFa 1.1 is published, I'm counting on DBpedia to be at the forefront of adoption.

The 'resource/Antioch' URL appears three times in the 'page/Antioch' document. The following link elements are in the header:
  • <link rel="foaf:primarytopic" href=""/>
  • <link rev="describedby" href=""/>

The body start tag looks like this:
  • <body onload="init();" about="">
Ignore the @onload, it's the @about that's interesting. It's just RDFa to say that all the parsable information in the document describes the resource .

But far more interesting to me is the 'rev="describedby"' in the quoted link element of the document's head. Note that it's 'rev', not 'rel'. The meaning of the whole element is "The current document describes the resource at". Yes, that's similar to the @about of the body. I really like the distinctiveness of using @rev . It's easily accessible by javascript or by an RDFa extractor. And I like that I can point to a major player in the Linked Data world as a precedent. That gives it a sense of de facto standard. And a little googling of 'describedby' found instances on the W3 site. It seems it's not quite an officially accepted standard but, again, it's nice to see a major player possibly getting behind 'describedby'.

So it's worth asking if this is a convention that others might be willing to adopt. Any takers or comments? Is @rev too obscure? Other objections?

I also want to briefly point out that the DBPedia 'page/...' documents make some effort to be clear to human readers that they are describing resources. The link at the top of 'page/Antioch' is to 'resource/Antioch'. This could be clearer but is a start.

And as for scholarly practice, I'll just briefly say that this discussion is in part inspired by the observation that Concepts should be permanent, Documents may be temporary. Looking back to the Geonames discussion of yesterday, I will not hold it against if it stops responding to the URL . Maybe html will fall out of use someday. It will be annoying if the string of characters , ceases to mean anything. Actually, I wish they'd remove the 'sws' cruft from that URL but that's their choice. Scholarship likes permanence and to the extent that the distinction between document and concept is clearly maintained, scholarly practice will be well served.

Monday, May 10, 2010

Concept and Document in the Ancient World Semantic Web

This post is really just me taking some notes on semantic web usage. Apologies if it's too discursive but I'm just at the gathering info stage right now.

Along with some colleagues, I've been thinking about the relationships between concrete action and scholarly intent that are inherent in the links we make when creating digital publications.

First some background. Here's a "test" sentence, along with its html.

Themistocles was born in Athens.
<a href="">Themistocles</a> was born in <a href="">Athens</a>. is a document found on the Internet. As used in our sentence, it is a placeholder for Athens - nebulously defined, I admit - as a concept. Asking the question, "What is the latitude and longitude of Athens?", focuses the issue. It is not useful to respond with the location(s) of the Wikipedia servers. We clearly want to know the location of the site in "the real world", or 37° 58′ 0″ N, 23° 43′ 0″ E.

Links point to documents, we often mean the underlying concept. Often this distinction doesn't matter. Sometimes it does, as in:

My source for the longitude and latitude of Athens is the Wikipedia article for Athens.

That sentence has the same link appearing two times, one meaning the concept, the other meaning the document. Wikipedia provides no mechanism for distinguishing between these meanings.

DBpedia does implement this distinction. But first, here's the intro sentence from the DBpedia website:
DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia, and to link other data sets on the Web to Wikipedia data.

In DBpedia, the following URLs are both valid:


The first refers to the concept, the second is a specific document. This allows for the following useful HTML:
My source for the longitude and latitude of <a href="">Athens</a> is the <a href="">DBpedia page</a>.

Looking at the DBpedia page is useful because it gives a list of resources that are each related to dbpedia:Athens via owl:sameAs. These are:

Before looking at one of these, what is owl:sameAs? The OWL Web Ontology language is described here. Among the descriptions of owl:sameAs given there is that a "...typical use of sameAs would be to equate individuals defined in different documents to one another, as part of unifying two ontologies". So the DBpedia usage, which is paralleled in many other semantic web resources, is spot on.

The reference is interesting. In part because the site has a discussion that explicitly addresses the difference between concept and document: That page also has a link to a good blog post.

DBpedia follows the Geonames guidelines in using owl:sameAs to qualify its link to , which is the Geonames URI for the concept "Athens". Clicking on that redirects you to the page Note the change of host to '' and the addition of 'athens.html'. The serial number remains the same.

Here is a screen grab of the "balloon" that is displayed next to the icon indicating the location of Athens.

There are two interesting links shown in this image: 'perma link' and 'semantic web rdf': is just the link to the page. is an RDF document. It's worth looking at the source to see the attribute 'rdf:about=""'. URLs of the pattern 'http:...about.rdf' are documents. is a concept.

Even with this soup of web addresses, there is a lot that Geonames is doing right. The only missed opportunity I see is no explicit indication in the "264371/athens.html" page of the concept address. There is the following: <link rel="alternate" type="application/rdf+xml" title="RDF Version" href="" />'. This is a link to a document not a concept. And 'alternate' is too vague for me to know that I can parse that RDF to find its @about value.

It would be nice if there were somelthing like '<link rel="concept" type="application/rdf+xml" title="Concept URI" href="" />'. I'm not too concerned with what's in @type so I left it as is. Bit 'concept' is not in anyway standard. I just made it up.

If this post has a point, that's it. Make it really easy for me to figure out which URI is for the concept, because that's the one I really want to use. Or maybe I should end with a question. Is there an unambiguous and widely-accepted convention for indicating the concept lying behind a document? If not, we need one.

Wednesday, April 7, 2010

Screen shot of Troy pottery on iPad

Quick post...

Here's a screen shot of Greek, Roman and Byzantine Pottery at Ilion (Troia) on an iPad. This is in the default reader iBooks.

You can click through to the Flickr page to get the full-size image. This gives a sense of the dimensions of the iPad screen.

OK, lots of work to do to make the page layout better, but I will admit to thinking this is cool.

Some annoyances: This volume is co-edited with Billur Tekkök but only "S. Heath" is showing up at the top. The underlined links don't work. That's also the case in the Stanza ePub reader on my iPhone. Desktop ePub readers do follow links (or at least some of them do). I think they all should, regardless of device.

There will be more to say so stay tuned. The brief "take away" is that eReaders will be one channel for the distribution of archaeological data. I look forward to RadioSchack and the like selling color devices for somewhere between $79.00 and $179.00

p.s. You can get the file here: .

Saving Archaeology in Italy

The US State Department's Cultural Heritage Center circulated the following e-mail yesterday. I'll have more to say on this soon, but for those of you already inclined to write a letter in support of extending the MoU with Italy that protects its archaeological resources, the deadline is April 22 and you can send your comments in as an attachment to <>.

From: "Cultural Property" <>
Date: April 6, 2010 11:26:52 AM EDT
To: "Cultural Property" <>
Subject: Meeting of the Cultural Property Advisory Committee

The Department of State’s Cultural Heritage Center would like to draw
your attention to an announcement that will be published in tomorrow’s
Federal Register

[Billing Code: 4710-05]
[Public Notice 6945]

Notice of Meeting of the Cultural Property Advisory Committee
In accordance with the provisions of the Convention on Cultural Property
Implementation Act (19 U.S.C. § 2601 et seq.) (the Act) there will be a meeting of
the Cultural Property Advisory Committee on Thursday, May 6, 2010, from 9:00
a.m. to approximately 5:00 p.m., and on Friday, May 7, 2010, from 9:00 a.m. to
approximately 3:00 p.m., at the Department of State, Annex 5, 2200 C Street, N.W.,
Washington, D.C. During its meeting the Committee will review a proposal to
extend the “Memorandum of Understanding Between the Government of the United
States of America and the Government of the Republic of Italy Concerning the
Imposition of Import Restrictions on Categories of Archaeological Material
Representing the Pre-Classical, Classical and Imperial Roman Periods of Italy”
signed in Washington, D.C. on January 19, 2001 and amended and extended in 2006
through an exchange of diplomatic notes. The purpose of this review is for the
Committee to make findings and a recommendation regarding the proposal to extend
this Memorandum of Understanding.

The Committee’s responsibilities are carried out in accordance with
provisions of the Act. The U.S. – Italy Memorandum of Understanding, as
amended and extended, the Designated List of restricted categories, the text of the
Act and related information may be found at

Exercising delegated authority from the President and the Secretary of State,
I have determined that portions of the meeting on May 6 and 7 will be closed
pursuant to 5 U.S.C. § 552b(c)(9)(B) and 19 U.S.C. § 2605(h), because the
disclosure of matters involved in the Committee’s proceedings would compromise
the Government’s negotiation objectives or bargaining positions on the
negotiations of this Memorandum of Understanding. However, on May 6, the
Committee will hold an open session, 9:30 a.m. to approximately 11:30 a.m., to
receive oral public comment on the proposal to extend the Memorandum of
Understanding. Persons wishing to attend this open session should notify the
Cultural Heritage Center of the Department of State at (202) 632-6301 by
Thursday, April 22, 2010, 5:00 p.m. (EDT) to arrange for admission, as seating is
extremely limited.

Those who wish to make oral presentations should request to be scheduled
and submit a written text of the oral comments by Thursday, April 22, 2010, to
allow time for distribution of these comments to Committee members for their
review prior to the meeting. Oral comments will be limited to five minutes each or
less to allow time for questions from members of the Committee and must
specifically address the determinations under section 303(a)(1) of the Act, 19
U.S.C. § 2602(a)(1), pursuant to which the Committee must make findings. This
citation for the determinations can be found at the web site noted above. The
Committee also invites written comments and asks that they be submitted no later
than April 22, 2010. All written materials, including the written texts of oral
statements, should be faxed to (202) 632-6300, if 5 pages or less. Written
comments greater than five pages in length must be duplicated (20 copies) and
mailed to Cultural Heritage Center, SA-5, Fifth Floor, Department of State,
Washington, D.C. 20522- 0505. Express mail is recommended for timely delivery.

Date: MAR 29 2010 _______________________
Judith A. McHale
Under Secretary
Public Diplomacy and Public Affairs
Department of State
[FR Doc. 2010-7898 Filed 04/06/2010 at 8:45 am; Publication Date: 04/07/2010]

Tuesday, March 30, 2010

Of iPads, ePub and Troy

Apple's iPad will start arriving in customer's hands on April 3rd, and at least some of them will end up in the hands of Mediterranean archaeologists. That sounds like an opportunity so I've spent the last few days playing around with ePub software.

I can quickly say that all the desktop readers that I've seen are terrible. Adobe Digital Editions? Ugly. The Stanza desktop version? Ugly and doesn't show my images. Lovely Reader looks OK but does a bad job of laying out text and image.

The Stanza reader on the iPhone is decent and that's what I've been testing against.

The most useful application is far-and-away Calibre. Once I figured out to install the command-line tools, converting xhtml to epub is a single step. There are lots of options to play with, but the basic idea is simple.

So... I spent a little time simplifying the xslt stylesheets that convert the database for Greek, Roman and Byzantine Pottery at Ilion into xhtml. Then I pointed Calibre at the resulting file. You can download an early version of the results here. Don't expect too much. No TOC, bad spacing so it's hard to distinguish catalog entries, other varied problems. But all will improve over time.

To get this on an iPhone you need to load Stanza on to it. Then "Get Books" -> "Downloads" -> "Edit" -> "Download Book from URL". Typing a long URL is a pain on the iPhone so enter this:

Again, it's super-preliminary so keep an eye out for improvements. And if anybody gets this onto an iPad, let me know how it looks. I'll try it myself soon enough but not right on April 3rd.

Tuesday, March 23, 2010

SPARQL Based Navigation of RDFa Encoded Named Entities

This is a quick heads up on a new feature at, the ANS-hosted project assigning stable URIs to numismatic concepts.

At you'll find a very brief representation of the ancient site of Lyttus in Crete. It links to the relevant Wikipedia article and Barrington Atlas ID so it should be unambiguous which site we mean and it should be easy to find out further information. As a convenience, and to make it easy to put a dot on the map, the page also has latlong info.

This post is about the list of URIs prefaced by the text "The following Nomisma IDs refer to this ID:". If you click on, you'll get a description of a hoard of coins as published in Inventory of Greek Coin Hoards. uses RDFa so the markup of the hoard includes the snippet: <span rel="nm:mint" resource="lyttus">Lyttus: 1 dr.</span> . You can click on "Show Markup in Page" to see this.

All the descriptions of numismatic concepts are collected in a single RDFa file at and as RDF-XML at That one snippet from igch0151 will produce the triple:

So... visiting queries with a SPARQL statement of the form ' SELECT ?id WHERE { ?id ?refersto <> }'. This just binds "?id" to a list of the ids that refer to Lyttus.

Bottom line: Simple markup achieves meaningful results using pre-existing standards. I wrote none of the tools to make this work. It will be fun when I get around to turning that list of IDs into a map, which will be simple using the Google Maps API. Then we will have a geographic front-end to "SPARQL-based Navigation of RDFa Encoded Named Entities".

Note: all of the RDFa patterns are in the process of being defined and the entries are in the process of being markedup. I.e., this is all in alpha stage.

Tuesday, March 16, 2010

The Relative Value of Oil and Wine in the Talmud

For the last week I've been following Daf Yomi, the 7 year cycle by which Orthodox Jews read the entire Talmud. It's part of my "Echoes of Late Antiquity" hobby and so far I'm having fun. Take this translated quote from Sanhedrin 31a
If one witness attests [the loan of] a barrel of wine, and the other, of a barrel of oil: — such a case happened, and it was brought before R. Ammi, who ordered him [the defendant] to repay a barrel of
wine out of [the value of] the barrel of oil.
So a "barrel" of oil is worth more than the same of wine. That's nice to know. Of course, I'm relying on the translation from and that's always a worry.

FWIW, the legal principle here is that you need two witnesses. Since the value of the oil is higher, there are only two witnesses to the loan of the value of the barrel of wine.

Friday, February 26, 2010

Coming to terms with HTML 5

I haven't heard much talk among digital humanists about HTML 5. If I've missed something please let me know.

I will admit that for a long time I sort of ignored it. I was interested in xhtml 2 but that's dead. And when the html 5 discussions began, xhtml seemed like a barely tolerated intruder. That's clearly less so currently. Then there was the dismissive attitude of the "5" folk towards RDFa. Everybody seems to be talking now and that's good.

So it looks like there will be an XHTML5 that directly supports RDFa. I'm assuming that means in the DOM as it's made available to Javascript. (Somebody tell if I'm wrong about that).

With this in mind, I spent the day catching up with developments in the html 5 community. Sometimes focusing on integration with RDFa but also just catching up.

This series of articles by "boblet" was well-written and useful. On the RDFa front, I read Mark Birbeck's discussion about tokenizing RDFa. Likewise interesting. And see the RDFa section of the HTML5 page.

I like that html 5 supports structures along the lines of:

<h1>Next section</h1>
Add in more xhtml 1.0 bits and you can really think about doing a nice job of publishing prose works digitally with the html5 vocabulary. And don't forget the '<article>' element. That looks interesting as well.

Not all is perfect. I've always been bummed that the title element goes in the head of an (x)html document. That means that if you want it to show up in the document part of a browser window, you have to repeat it. There's some silliness there. Why can't a title element go anywhere? And would it it really be a problem if a document had more than one title in it? I can think of use-cases where that works: more than one article in a single html file, or a list of objects that have titles.

And there's still no preferred way of doing footnotes. The section in the spec 4.6.26 is sort of a punt. The boblet articles suggest <aside> for footnotes but that isn't encouraged in the spec. I see that there's a "note" value for the rel attribute on the WHATWG RelExtensions page. That list is an official part of the html 5 spec (see "Other Link Types"). But the spec is totally vague on how a proposed rel moves to actual approval.

And anybody using xhtml is still going to have lots of decisions about what goes in class attributes and how to specify lots of basic things like 'author'. That smacks of being proprietary. How much can Dublin Core help with this?

So... it was a day of mostly reading. I added a little bit of xhtml 5 to the git repository under an xhtml5 branch but only just a hint of what I should do to really "commit" to such a big change.

Tuesday, February 23, 2010

OpenCyc + Wiki/DB-Pedia and Ancient World References

This is another post in the Ancient World RDFa series.

I'm writing now because I have two questions in mind, one fairly general and one very specific:
  • Is there a pre-existing ontology that I can use to identify concepts found in Ancient World scholarship?
  • How can I indicate the office of "strategos" that was held by the sophist Polemon.
The topic comes up because I'm faced with the sentence fragment:
Polemon also appears as strategos on coins of Hadrian...
Again, how to mark the text "strategos" so that it is identified as the ancient office. Here's what I have so far:
typeof="skos:Concept opencyc:PublicOffice"
That give the following RDF/Turtle
a opencyc:PublicOffice, skos:Concept ;
owl:sameAs dbpedia:Strategos ;
rdfs:label "strategos"@en .
In short, this says that there's an instance of a public office and that office is "strategos".

The "opencyc" namespace maps to "". You can read about OpenCyc at, where you'll be told that OpenCyc is an "ontology containing hundreds of thousands of terms, along with millions of assertions relating the terms to each other, forming an ontology whose domain is all of human consensus reality." Even accounting for "commericial-speak", this could be useful. And yes, it's based on a commercial product, but CC-Licensed versions of the whole thing can be downloaded from

The landing place for PublicOffice is "Mayor" and "Ambassador" are example instances of PublicOffice so I'm comfortable using it as the type for Strategos. But "Strategos" iteself is not in OpenCyc. I think this will be a common situation: knowledge bases intended for the modern world will have many useful analogs for concepts that appear in Ancient World scholarhip, but the specific vocabulary will be missing.

OpenCyc has entries forYou can replace many narrowly scoped namespaces with these and other concepts that appear in OpenCyc.

But again, no "Strategos". This is where Wikpedia (via DBPedia) comes in. Here's the Wikipedia article. I map that into the Semantic Web via DBPedia.

So here's a basic principle: OpenCyc is the default ontology, DBPedia is the default vocabulary. I think that plays to the strengths of each resource.

Neither is complete for the Ancient World. That's probably more of a problem for the use of OpenCyc. DBPedia doesn't have a page for the ceramic type "Eastern Sigillata A". If I write one for Wikipedia, that will eventually migrate to DBPedia. OpenCyc doesn't have an easy route for community-based editing. Will the concepts "Excavation Unit" or "Survey Collection Unit" be necessary? Probably. That means coming up with or finding an ontology for those.

Thursday, February 4, 2010

Ancient World Digital Publishing Test Suite

This post is just a brief notice that I have begun a test suite of xhtml+rdfa and related documents to facilitate my work on digital publication for ancient world scholarship. It's very much "pre-release" at this point so I'm putting the suite out there for the sake of sharing, not because it's useful in its current state.

Right now, there are a few files in a git repository at To download, try

As the files become more useful, I'll talk more about what I'm trying to achieve with this project.

Tuesday, January 26, 2010

RDFa Patterns for Ancient World References

I am continuing to experiment with semantic links within digital publications relevant to the Ancient World. Here's a snippet from the same article I drew from in the last post.
In 124, Polemon had spoken before Hadrian and persuaded him to make a gift of money and grant a series of honors to Smyrna, not least of which was a second temple to the imperial cult (IvS 697; Burrell 2004: 42-48).
The "things" I want to identify are:
  • The year 124 as an event.
  • The sophist Polemon
  • The emperor Hadrian
  • The imperial cult
  • And the two citations
And I want to do this in a standards-based way that is automatically recognizable by third-parties (or at least their software agents).

As before, I'm using RDFa. In a future post, I'll explain this choice and talk about what RDFa and RDF are, but for now I'm diving right in.

The relevant namespaces that I'm using are:
  • xmlns:dbpedia=""
  • xmlns:cito=""
  • xmlns:ev=""
  • xmlns:ex=""
  • xmlns:foaf=""
  • xmlns:frbr=""
  • xmlns:geo=""
  • xmlns:owl="http ://"
  • xmlns:rdfs=""
  • xmlns:skos=""
  • xmlns:xsd=""
All the markup that follows is experimental and comments are welcome, of course.

The reference to Polemon now looks like:
<span id="id2209"
typeof="skos:Concept foaf:Person"
rel="owl:sameAs cite"

With the '<head>' of the document including '<base href=""/>', that RDFa gives the following RDF/turtle:

owl:sameAs dbpedia:Polemon_of_Laodicea ;
a skos:Concept, foaf:Person ;
<> dbpedia:Polemon_of_Laodicea ;
rdfs:label "Polemon"@en .
Some observations:
The pairing of 'id' and 'about' attributes means that I can identify a span of text and then say things about it.

I then give that span a type. Here I say that it's a skos:Concept and a foaf:Person. Which concept and which person? 'skos:Concept' will be used on all named-entities, and their nature will be further qualified when it's useful.

Why "owl:sameAs'. Here I follow the usage of If you look at the Polemon page, you'll see the same construct used to make the link to freebase. 'owl:sameAs' also underlies (see the n3 for Hadrian).

The metaphor here is that I am instantiating Poleman as a concept and person present in the text. That should be recognizable and actionable. There is some redundancy in how I go about doing it, but that is in the spirit of convenience for future processors of this data.

"In 124"
This looks like:
<span id="id3724"
content="124">In 124</span>
Same basic process. I isolate some text as individually addressable. I say what is, in this case a FRBR Event. Here I also embed a machine-readable property, the start date, into the document , but retain the inline text as the label.

But I am probably on less-firm ground here. I use FRBR because it's an LOC approved standard. I annotate the event with an RSS Event property and that's a little weak. And it might seem odd to equate the event with the dbpedia representation of the year 124. If you follow through to the wikipedia version, that does refer to Hadrian's trip east, which is the setting for Polemon's speech. In the case of a better known event, I think I'd prefer to link to a representation of that, for example The 'owl:sameAs' on that page will eventually redirect you to the right Wiki page.

Here's the RDF/Turtle produced by the above RDFa:
owl:sameAs <dbpedia:124> ;
ev:startdate "124"^^xsd:year ;
a frbr:Event, skos:Concept .
As above, the goal is for this to be usable in a number of contexts.

There are two inline references at the end of the sentence. The first is to a primary source, an inscription at Smyrna as published in Petzl, G. (1982). Die Inschriften von Smyrna. Bonn: Habelt. The second is to Barbara Burrell's Burrell, B. (2004). Neokoroi: Greek cities and Roman emperors. Cincinnati classical studies, new ser., v. 9. Leiden: Brill.

Here's the RDFa for the second:
<span id="id4616"
rel="cito:citesAsAuthority cite"
property="rdfs:label">Burrell 2004: 42-48</span>
This is similar markup as previously, except I'm not instantiating it as a 'skos:Concept'. I am using the CITO ontology to indicate the relationship between the works, but note that I'm currently making up the type 'ex:Citation'. Perhaps I could use 'cito:Document' but that doesn't seem quite right. I really want to mark this span of text as being a citation but haven't found just the right RDF vocabulary. I looked at BIBO but, like CITO, it doesn't have the exact class I want. BIBO is linked with Zotero so I'd like to use it. For now, CITO has a more detailed set of relationships between citing and cited documents so I'm going with that. Worldcat also isn't great because there's confusion about the 'terms of use' but it will do for this experimental phase.

Here's the RDF/Turtle:
cito:citesAsAuthority <> ;
a ex:Citation ;
<> <> ;
rdfs:label "Burrell 2004: 42-48"@en .

The RDFa for the epigraphic reference looks like:
<span id="id9773"
rel="cito:citesAsAuthority ex:citesAsPrimarySource"
property="rdfs:label"><i>IvS</i> 697</span>
The main difference here is that I'm also making up the 'ex:citesAsPrimarySource' value for the rel attribute. The concept of "Primary Source" and references thereto is important for the Humanities and we need a way of indicating its usage.

It's also important that I'm referring to the publication of the inscription, not the inscription itself. When a digital surrogate becomes available, I can point to that. In the meantime, a way of standardizing references to parts of a work would be useful. But I don't think you can just tag on a fragment identifier, as in, since the implication there is that such an ID actually exists. And it might be rude to put the same after a '?'. Something to ponder...

Instead of continuing on with each named entitiy, here's the whole sentence with RDFa visible:
<span id="id3724" about="#id3724" typeof="skos:Concept frbr:Event" rel="owl:sameAs" resource="dbpedia:124" property="ev:startdate" datatype="xsd:year" content="124">In 124</span>, <span id="id2209" about="#id2209" typeof="skos:Concept foaf:Person" resource="[dbpedia:Polemon_of_Laodicea]" rel="owl:sameAs cite" property="rdfs:label">Polemon</span> had spoken before <span id="id5130" about="#id5130" typeof="skos:Concept foaf:Person" rel="owl:sameAs cite" resource="[dbpedia:Hadrian]" property="rdfs:label">Hadrian</span> and persuaded him to make a gift of money and grant a series of honors to <span id="id39156" about="#id39156" typeof="skos:Concept geo:SpatialThing" rel="owl:sameAs cite" resource="" property="rdfs:label">Smyrna</span>, not least of which was a second temple to the <span id="id4168" about="#4168" typeof="skos:Concept dbpedia:Religion" rel="owl:sameAs cite" resource="dbpedia:Imperial_cult_(ancient_Rome)]" property="rdfs:label">imperial cult</span> (<span id="id9773" about="#id9773" typeof="ex:Citation" rel="cito:citesAsAuthority ex:citesAsPrimarySource" resource="" property="rdfs:label"><i>IvS</i> 697</span>; <span id="id4616" about="#id4616" typeof="ex:Citation" rel="cito:citesAsAuthority cite" resource="" property="rdfs:label">Burrell 2004: 42-48</span>).
And here's the RDF/Turtle:

owl:sameAs <dbpedia:124> ;
ev:startdate "124"^^xsd:year ;
a frbr:Event, skos:Concept .

owl:sameAs dbpedia:Polemon_of_Laodicea ;
a skos:Concept, foaf:Person ;
<> dbpedia:Polemon_of_Laodicea ;
rdfs:label "Polemon"@en .

owl:sameAs dbpedia:Hadrian ;
a skos:Concept, foaf:Person ;
<> dbpedia:Hadrian ;
rdfs:label "Hadrian"@en .

owl:sameAs <> ;
a geo:SpatialThing, skos:Concept ;
<> <> ;
rdfs:label "Smyrna"@en .

owl:sameAs <dbpedia:Imperial_cult_(ancient_Rome)]> ;
a dbpedia:Religion, skos:Concept ;
<> <dbpedia:Imperial_cult_(ancient_Rome)]> ;
rdfs:label "imperial cult"@en .

ex:citesAsPrimarySource <> ;
cito:citesAsAuthority <> ;
a ex:Citation ;
rdfs:label "<i>IvS</i> 697"^^rdf:XMLLiteral .

cito:citesAsAuthority <> ;
a ex:Citation ;
<> <> ;
rdfs:label "Burrell 2004: 42-48"@en .

Some of these constructs deserve more comment but this post is getting long. The only thing to add is that fairly soon I will publish a javascript toolset that starts making use of these patterns.