Wednesday, February 23, 2011

Test Bed for (X)HTML Conventions for Scholarly Publication

The main reason I joined the Institute for the Study of the Ancient World at NYU was to be part of initiating a program of digital publication of peer-reviewed scholarship. We haven't announced anything formally and this blog post isn't that announcement. It is the beginning of a nuts-and-bolts conversation about the markup of digital scholarship that is intended to encourage long-term viability, flexible re-use, and easy display (among many other things).

To get right down to business, is the very temporary URL for a preprint version of "Review of Ptolemaic Numismatics, 1996 to 2007" by Catherine Lorber and Andrew Meadows. I'm very grateful to Andy and Cathy for their willingness to be part of this experiment. Their work is largely done. Now it's up to me to make progress on the markup and I'm hoping to do that in a very public way.

But where to begin the conversation? I think the best approach is to admit I'm in the middle of things and just start laying out issues and thoughts. Keep in mind that everything is subject to change...
  • The format for ISAW digital publications is XHTML with RDFa. XHTML (for now 1.1 but moving to XHTML5) is a widely supported standard with excellent tooling that is directly viewable in many contexts. That makes it appropriate for long-term archival storage of born-digital scholarship.
  • Internal reference structures are important.For now this means each <p> element has an @id. div's of class 'section' also have @id attributes. This is in anticipation of using the semantic elements of HTML5.
  • Named entities will be tagged with links to stable resources describing those entities. For geography, Pleiades. For many other entities, Wikipedia. See below for RDFa patterns.
  • Existing ontologies/vocabularies will be used whenever possible. Geographic entities are typed as "dcterms:Location". That sort of thing.
  • Basic constructs for marking up bibliography and footnote-like structures are lacking for HTML-based markup languages. There are lots of semi-complete "best practices" but narrowing these down to a consistent and flexible convention will be an importnat process.

Looking ahead:
  • Multiple formats will be supported. We will distribute this text as "raw" valid xhtml. It will be hosted in a more interactive environment that does slick things like make maps, etc. Epub, pdf... all those are coming. Again, the ease with which a base XHTML representation can be converted to these other formats is one reason to use XHTML.
  • We'll use CC licenses Right now the document is CC-BY-NC-ND. We'll drop the ND eventually, perhaps the NC as well. The preprint is ND as a signal that a better version is coming from us.

A word on RDFa (a standardized way of embedding information in XHTML pages)...

The basic pattern that I'm using to markup named entities is illustrated by the sentence:
In a study of tax receipts from early Ptolemaic <a class="citation"
typeof="dcterms:Location" rel="iana:describedby"

That produces the RDF/Turtle
[ a dcterms:Location ;
rdfs:label "Thebes"@en ;
iana:describedby <>].
You can see the turtle for the whole document at

An "English" equivalent of the turtle snippet is 'There is a site in the text with label "Thebes" and a description at'

I like the use of the 'describedby' @rel value here. It's defined in the IANA's register of rel values ( I take the semantics to be "I'm not saying I'm linking to Thebes itself, only to a description of it." That seems nice and "semantic webby".

There's more to come but I'm getting this out there just to get the ball rolling...

1 comment:

Emerson said...

This is admittedly orthogonal to the thrust of your post, but cheers for making me aware of Pleiades. It's a fantastic resource, and not one I'd previously known to use.