Saturday, December 31, 2011

Toying with 'Knowledge Representation and Reasoning' for the Ancient World

This is a very (very!) rough opening entry in a discussion I hope to push forward in 2012. But first some preliminaries.
  • I don't know a lot about "Knowledge Representation and Reasoning" but I do know more than I did 48 hours ago. I'm in the world of "Semantic Reasoning" and "OWL 2 Ontologies". That's an interesting, and often very technical, place to be. But fun, all the same.
  • That's why I put "Toying" in the title of this post. I'm really just playing around here and figure I won't find out what I'm doing wrong if I don't share thoughts sooner rather than later.
I've opened a github repository at so I'll just dive right in using the mini-ontology that I started there. 'awo' stands for 'Ancient World Ontology' and, again, that's what I'm thinking about. 

The file 'awo.owl' defines, among other things, two people: 'Augustus' and 'Lucius Cornelius Sulla'. This is an opportunity to note that the authority file I'm using for names (of people or other entities) is Wikipedia. I don't know of another publicly accessible resource with such extensive coverage combined with a simple mechanism for creating new identities. As it stands now (this github commit), awo.owl says the following about Augustus and Sulla:

 <owl:Thing rdf:about="#Augustus">
    <rdf:type rdf:resource="" />
    <is rdf:resource="#Roman_Emperor" />
    <is rdf:resource="#Pontifex_Maximus" />
    <is rdf:resource="#Tribune" />
    <owl:sameAs rdf:resource="" />
    <owl:sameAs rdf:resource="" />

  <owl:Thing rdf:about="#Lucius_Cornelius_Sulla">
    <rdfs:label>Lucius Cornelius Sulla</rdfs:label>
    <rdf:type rdf:resource="" />
    <is rdf:resource="#Roman_Dictator" />
    <owl:sameAs rdf:resource="" />

I hope some of the 'meaning' of this markup is accessible even without 'knowing' OWL. I'm asserting that there are entities (owl:Thing's) "Augustus" and "Lucius_Cornelius_Sulla". It connects those to other defined entities such as "Roman_Emperor" and "Roman_Dictator". Again, those names are taken from Wikipedia.

I know some people won't like the use of "owl:sameAs", but I think it conforms closely to the definition of that term in the OWL 2 documentation. And what about  the "is" property. Here I did become concerned that none of the OWL 2 terms for indicating equivalence between "owl:Thing"'s worked. So I made up the generic "is" property to match the very generic and informal semantics of the fairly reasonable statement "Augustus is a Roman Emperor". I could have used "was" but that seemed silly.

But what about reasoning? The repository also has a file "awo-reasoned.rdf'. That has the following (slightly re-ordered and abridged) statements about both Augustus and Sulla:

  <rdf:Description rdf:about="">
    <rdf:type rdf:resource=""/>
    <j.1:is rdf:resource=""/>
    <j.1:is rdf:resource=""/>

    <owl:sameAs rdf:resource=""/>
    <owl:sameAs rdf:resource=""/>

    <rdf:type rdf:resource=""/>
    <rdf:type rdf:resource=""/>
    <rdf:type rdf:resource=""/>

    <rdf:type rdf:resource=""/>

  <rdf:Description rdf:about="">
    <rdf:type rdf:resource=""/>
    <j.1:is rdf:resource=""/>
    <j.1:is rdf:resource=""/>
    <j.1:is rdf:resource=""/>
    <j.1:is rdf:resource=""/>
    <j.1:is rdf:resource=""/>

    <owl:sameAs rdf:resource=""/>
    <owl:sameAs rdf:resource=""/>
    <owl:sameAs rdf:resource=""/>

    <rdf:type rdf:resource=""/>
    <rdf:type rdf:resource=""/>
    <rdf:type rdf:resource=""/>
    <rdf:type rdf:resource=""/>

    <rdf:type rdf:resource=""/>

This file is generated by the command-line tool in the open source OWL-DL reasoner Pellet. Another win for open source as far as I'm concerned.

To the extent that the mini awo ontology hints at a useful future, it's because both Sulla and Augustus are 'known' to be "#Roman_Office_Holder"s. The ontology defines the owl:Class "Roman_Republican_Office_Holder" as all owl:Things said to be "Roman_Dictator"s. "Roman_Imperial_Office_Holder" is defined as all owl:Things said to be "Roman_Emperor"'s. Both of these classes are sub-classes of "Roman_Office_Holder".

Looking ahead, this simple (simplistic?) demonstration suggests a world in which it is possible to search a corpus of information - be it primary texts or secondary scholarship - for references to "Roman Office Holders" and be shown all documents (or other resources) that reference either Augustus or Sulla. That would be cool.

If you dig into awo-reasoned.rdf, you'll see that everything it says about "Augustus" it also says about the URI VIAF is the "Virtual International Authority File". Here I'm trying to (again, simply) explore the idea that if an author were to link to that well-known URI published by VIAF, then it would be discoverable that the document making the link referred to not only the Emperor but also to the more generic concept "Roman_Office_Holder". So imagine an Internet that can be queried for "All references to Roman office holders".

And we do want to support more complex queries: "All late Roman military sites in Syria within 30 kilometers of the findspots of LR coins or LR African Red-Slip". We're a long way from that but it's doable on the basis of existing technologies. And the content to support such queries is slowly coming online.

Some other bullet points in this world:
  • It's a world that I can think about because of conversations I've been having with my colleagues at ISAW, with the people running Pelagios, with people I've been writing papers and grants with. And other. There's nothing exceptionally original here. But the next step is to be part of "just doing it."
  • There needs to be a mechanism for bringing together existing RDF-based resources into a big pile of triples from which  a reasoner can extract interesting relationships. The work can't be done by hand by a few individuals. But if we just let the machines run wild, we'll end up with silly conclusions. We need to find the right balance of automatic processing and community sourcing to create an "Ancient World Inference Engine" or "Ancient World Semantic Reasoner" that is actually useful.
  • And that's probably an important principle: make it useful. Here are some thoughts on that:
    • When a "third party" resource links to a URI such as "" (or its VIAF equivalent), it would be nice if there were a javascript library that showed a menu offering links based on a JSON serialization of the 'knowledge' in awo-reasoned.rdf. This is an idea that has been floating around and whose time has come.
    • The network of links to stable URIs should be harvested so that the reasoner can work across the entire Ancient World Internet. The internet is the interface that allows community sourcing.
    • Existing resources that provide stability - such as Perseus, PAS, Pleiades, DBPedia, OpenContext,, and many others -  should be incorporated. Keep new work to a minimum.
    • Another way of saying the above is that an "Ancient World Triple Store and Reasoner" should look to be a "pass through" resource reflecting the existing and developing state of the Internet rather than a destination itself.
    • The whole big pile of reasoned triples should be downloadable so that others can pay for the cycles to query it when they're doing something really complex. CC everything!
The above has started to wander a little bit so I'll end this post here. Let's see what happens in the next year or so...

Wednesday, September 14, 2011

ISAW Roman Pottery Reading Group: September 22

The 2011/2012 kick-off meeting of the ISAW Roman Pottery Reading Group is next Thursday, September 22 at 3:30. The topic is roughly "African pottery in the Eastern Mediterranean in Late Antiquity". As always, the readings don't cover the full range of what we could talk about:

  • Abadie-Reynal, C. 1989. “Céramique et commerce dans le bassin Égéen du IVe au VIIe siècle,” in V. Kravari, J. Lefort and C. Morrisson (edd.), Hommes et richesses dans l’Empire byzantin I. IVe-VIe siècle (Paris) 143-159.
  • Bonifay, M. 2005. “Observations sur la diffusion des céramiques africaines en Méditerrannée orientale durant l’Antiquité tardive,” in F. Baratte et al. (edd.), Mélanges Jean-Pierre Sodini (Travaux et Mémoires 15), 565-81.
  • Majcherek, G. 2004.  ‘‘Alexandria’s long-distance trade in Late Antiquity – the amphora evidence’’, in  ed. Jonas Eiring and John Lund (edd.), Transport Amphorae and Trade in the Eastern Mediterranean. Acts of the International Colloquium at the Danish Institute at Athens, September 26–29, 2002, 229-237.
  • Bes, P.M. and J. Poblome. 2009. "African Red Slip Ware on the Move: the Effects of Bonifay’s Etudes for the Roman East," in: J.H. Humphrey (ed.): Studies on Roman Pottery from Africa Proconsularis and Byzacena (Tunisia). Hommage à Michel Bonifay (Journal of Roman Archaeology Supplementary Volume 76), 73-91. [An incomplete version of this available at:]
The Abadie-Reynal is a classic and always worth looking at. It's important to take account of Bonifay's work so the Bes and Poblome article does that. The Majcherek gives a site specific view on the question, while also addressing large-scale historical issues. Should be fun.

Wednesday, June 22, 2011

Blogging my Digital Humanities 2011 Talk

I was all prepped to give a nice conversational version of my paper at Digital Humanties 2001 when my plane was delayed, so I had to spend an extra night in Boston, meaning my arrival in Palo Alto was bumped to after my allotted time. Oh, well. Here's a summary that presents some of what I was going to say.

The title was The Digital Materiality of Early Christian Visual Culture: Building on John 20:24-29 and the abstract is here.

My first "real" slide was a long-ish quote from the article Leonardi, P. 2010. "Digital materiality? How artifacts without matter, matter" in the online journal First Monday. It's online at Here's the quote:

I argue that treating materiality as the practical instantiation of theoretical ideas (like policies that allow women to vote help make material the idea that sexes are equal) or as what is significant in the explanation of a given context (like material evidence in a courtroom trial) provides a more useful framework for understanding how digital artifacts affect the process of organizing. I contend that moving away from linking materiality to notions of physical substance or matter may help scholars of technology integrate their work more centrally with studies of discourse, routine, institutions and other phenomena that lie at the core of organization theory, specifically, and social theory more broadly.
I've highlighted the bits I was going to focus on. "[M]ateriality as the practical instantiation of theoretical ideas" has useful overlap with how archaeologists think materiality. We often try to "back port" from the objects we find to what people were thinking, but Leonardi's explicit connection between ideas/thought and material is enough to prime the pump within the context of a 20 minute paper.

The "moving away" idea gave me something to play off of. It's not that I disagree with Leonardi, it's that I like to think about the continuum of interplay between thought and matter that is enabled by digital surrogates of material culture. Here are two snippets from what I said:
Just as the creation of the surviving material record should be recognized as the cumulative action of many individuals, it is likely that exploration of that record will be enabled by many projects and institutions working within their own areas of expertise and with content specific to their domain (Heath 2010, Terras 2010). It is the interactions of a series of self-digitizing and independent communities – here Early Christian textual studies and Numismatics – that can recover relationships between physical object and human thought that is a primary goal of materiality as a methodological approach .... Digital materiality is therefore an act of transmission (Liu 2004) so that its deficiencies leave it open to criticism.
I'm being selective in quoting myself so you may want to read the above in context. It has a slightly different twist there.

More briefly, my point is that transmission of digital surrogates for material culture will provide opportunities for a new/re-emphasis on the relation between object and thought - that is, "materiality" - in the ancient world and in the study of the ancient world. It will probably do so elsewhere but I'm an Ancient Med. person so that's where I focus. If we think of digitization as de-materialization, it will enable new appreciation of the material.

But what leads me to say that?

My specific example is the relationship between the text of John 20:19-29 and physical manifestations of it. That's basically the story of "Doubting Thomas", a phrase meaning someone who requires physical, unambiguous proof before believing something.

Those verses from the the Gospel of John chapter 20 are readily accessible online. Here's the KJV version. Or the New International Version. Do you read Macedonian? Go for it. Maybe you prefer Arabic?

The story culminates in Jesus saying, "Because you have seen me, you have believed; blessed are those who have not seen and yet have believed." You can get commentary on that phrase here (scroll to the bottom).

I threw in those links to show that the accessibility of the text doesn't come from the academy, that is, from the traditional home of Digital Humanities. Sure, the New Testament is both studied within universities and is available through DH stalwarts such as Perseus. But much of the digital action around this text comes out of the self-digitizing community that is the Christian web. I think that's cool and something we should be paying attention to. I discussed that more in the blog post "Digital Epistemology as Mediated through Tessellated Self-Digitizing Communities"over at Posterous.

But these links are not materiality. They're virtual all the way.

It's easy to materialize the text of 20 John:19-24 in a modern context. Here's a low-res image of the text (minus the beginning of verse 19) from the United Bible Societies' Greek New Testament GNT.

If we look more closely at verse 21 -  which in translation is "Again Jesus said, 'Peace be with you! As the Father has sent me, I am sending you.'" - we see:

The top image is the text in the GNT and the bottom is the critical apparatus or app. crit. If you look after the // in the ap. crit., you see the Greek word παλιν followed by the Hebrew letter aleph. That's the symbol for a 4th century manuscript known as the Codex Sinaiticus. I've linked to the Wikipedia article for the codex but it's important for my talk that a digital facsimile of the manuscript is available at Go take a look, it's a cool site.

And just by way of introduction, the Codex Sinaiticus is a fourth century manuscript that is one of our earliest complete versions of the New Testament. Much of it is now in the British Museum but came there indirectly from Saint Catharine's monastery on the Sinai Peninsula.

Here's a screen shot of the site's version of John 20:21.

Counting down seven lines from the top of the middle column brings you to the greek "εἶπεν οὖν αὐτοῖς πάλιν" or "He said to them again...". Note that in the GNT version the text is "εἶπεν οὖν αὐτοῖς [ὁ Ἰησοῦς] πάλιν" or "Jesus said to them again..." . The brackets around "ὁ Ἰησοῦς"are an indication of some uncertainty about the reading of the "original" text. The Codex Sinaiticus delivers one component of the material basis of that uncertainty. That's digital materiality.

Here's another CS screenshot:

You can see the extent of correction as a second scribe addressed both basic mistakes and subtle issues of reading in the original product.

We can expand the digital materiality of this text by linking to a gold roundel - "small round disk" - in the collection of the American Numismatic Society: . Here's the screen shot of that page:

To the left of the central Jesus you see Thomas reaching out to touch Jesus' wounds. There's also a slightly irregular transcription of the Greek for Thomas' declaration "My Lord, My God" and of Jesus response to the effect that those who have not required such proof are blessed.

Without going into too much detail, disks of this sort are believed to have been produced in Egypt. This piece doesn't come with a findspot but it is reasonable to invoke it next to the Codex Sinaiticus.

It is another materialization of John 19-29. One that combines text and image. It is a projection into physical space and across time of the message that Christian believers who did not have the opportunity to see Christ can be blessed.

These two objects - the Codex and the ANS disk - show that materiality does not remove the reader or viewer from our understanding of how texts worked. These materializations remind us that texts are physical objects that are responded to by people, and that one response is to change the materiality, as in editing the Codex. Certainly, one response is to debate the meaning of a text. The story of the Doubting Thomas is understood by many modern scholars as a statement against those who denied the humanity of Christ, among whom were the Gnostics, active in Egypt. The ANS disk is therefore part of an ongoing debate about Christ's nature.

The point of this post is not to go into great depth about what the pairing of these objects tells us about the role of materiality in the Late Roman/Byzantine Egypt. Instead, I want to stress that the opportunity to think about that issue with such relative ease arises from acts of independent self-digitization that exist within wider contexts of topically related efforts also engaging in self-digitization. That leads to an environment in which intellectual risk taking is rewarded.

I don't want the series of inferences above to be pigeon-holed into either saying something about the past or about the present. I think we're at a stage of Digital Humanities where we can recognize that we are doing both. We do not know what questions about the past that modern Digital Materiality will allow us to ask, but I bet we're about to find out.

Thursday, March 31, 2011

Brief Anecdote about Discoverability, Sigma Tables, and the Athenian Agora

In the middle of today's meeting of the ISAW Roman Pottery Reading Group, the issue of "sigma tables" came up. These semi-circular marble tables are invoked by both of today's authors so it was natural to pause on the topic. At which I point I mentioned, "there's one in the Agora and I bet it's online." Quick Google search on "marble sigma table agora" and we were a click away from the Agora's database.

That's object A 3869.

My only point is that because the object was easy to find via the public Internet, we were able to include it in our conversation. It was very useful to compare a specimen to Hudson's and Vroom's analysis and to the additional visual evidence they each gather.

As a reminder, here's what we read:
  • Nicholas Hudson. 2010. "Changing Places: The Archaeology of the Roman Convivium." AJA 114.4: 663-695.
  • Joanita Vroom. 2008. ‘The archaeology of late antique dining habits in the eastern Mediterranean: A preliminary study of the evidence’, in: L. Lavan, E. Swift and T. Putzeys (eds.), Objects in Context, Objects in Use. Material Spatiality in Late Antiquity (Late Antique Archaeology 5), Leiden and Boston: 313-361.

Friday, March 25, 2011

Cool pics of a Roman Hoard

I'm slow to geting round to this, but really, do visit The pictures of the coins are cool. Even cooler are the pictures of the large vessel they were buried in. Those of us in numismatics frequently see the dry phrase, "Found in pot", or the more concise term, "Pot hoard". This page will help you visualize what that really means.

Here's a sample:

Monday, March 21, 2011

Roman Pottery Reading Group at the Institute for the Study of the Ancient World

A few of my ISAW/NYU colleagues and I have begun a "Roman Pottery Reading Group," which seems to be settling into a sort of every-other-week-ish-y schedule.

We began with three "Romanization" articles:
  • D. Malfitana, J. Poblome and J. Lund. 2005. "Late Hellenistic imports of eastern sigillata A in Italy. A socio-economic perspective," Babesch 80: 199-212.
  • Poblome, Jeroen and Michael Zelle. 2002. “The table ware boom: a socio-economic perspective from western Asia Minor” in Christof Berns, Henner von Hesberg, Lutgarde Vendeput and Marc Waelkens (eds.), Patris und Imperium, Leuven: 275-287.
  • Rotroff, S. 1997. "From Greek to Roman in Athenian Ceramics," in M.C. Hoff and S.I. Rotrof ( eds.), The Romanization of Athens, , Oxford: 97-116.
It was an added bonus that my colleague Billur Tekkök, in the States on a Fulbright Fellowship, could join us for that first session.

Next we read:
The point here was to look at representative samples of 40 years of publication from one site. Put simply: what has changed in techniques and approaches over that time? A little "inside baseball" but a fun conversation.

Next up is ceramics and dining:
  • Nicholas Hudson. 2010. "Changing Places: The Archaeology of the Roman Convivium." AJA 114.4: 663-695.
  • Joanita Vroom. 2008. ‘The archaeology of late antique dining habits in the eastern Mediterranean: A preliminary study of the evidence’, in: L. Lavan, E. Swift and T. Putzeys (eds.), Objects in Context, Objects in Use. Material Spatiality in Late Antiquity (Late Antique Archaeology 5), Leiden and Boston: 313-361.

We're meeting Thursday, March 31 at 3:00 PM Eastern Daylight Time. It's tempting to see if anybody wants to join us virtually. If really, truly, "yes", I'll see what we can do.

Friday, March 4, 2011

LRC/Phocaean Red Slip at Alexandria Troas

Anybody who would enjoy seeing a nice color picture of LRC/Phocaean Red-Slip rim sherds should take a look at figure 23 on page 15 of Stefan Feuser's article "The Roman Harbour of Alexandria Troas, Turkey" in volume 40.1 (2010) of The International Journal of Nautical Archaeology, doi:j.1095-9270.2010.00294.x.

From Typed Links to Annotations in Ancient Geography

I've been participating in the discusions of the Pelagios Project's plans to establish semantic web/linked data conventions for linking geographic information in the ancient world. is listed as a partner and it's a good group of people who are coming together to think about the issue.

As always, the individuals and projects involved don't want to re-invent the wheel. And, also as always, some new work - even if it's just establishing a domain-specific use for existing standards - is necessary. That last is what I'm thinking about right now.

I mean the title of this post to establish an axis of complexity when it comes to relating a web-based resource to a geographic entity. A "typed link" is basically plain-old HTML with a little bit of RDF-sugar to say that the end-point is a geographic entity. I've already spoken about doing this in earlier posts. Here, let me start with the RDF/Turtle:

@prefix dcterms: <> .
@prefix geo: <> .
@prefix powder: <> .
@prefix rdfs: <> .

[] a dcterms:Location,geo:SpatialThing;
powder:describedby <>;
rdfs:label "Rome" .
This is the RDFa:

<a href="" typeof="dcterms:Location geo:SpatialThing" rel="powder:describedby" property="rdfs:label">Rome</a>

Again, that's pretty simple html that adds a little in-place information that the link is to a geographic entity that is defined at a particular URL. There are many tools that can parse that link and do interesting things like show a map. Hence the term I'm using here, "typed link". And I include as an "interesting thing" the now prosaic ability of a user to click on that link when it's rendered by a browser. Human readable and machine actionable. Win, win.

To be clear, with this post I am suggesting to my Pelagios colleagues that we use this or a similarly "light-weight" convention for the simple case of a link to a geographic entity. And yes, I don't mind if you use dcterms:Location, geo:SpatialThing or both. Those are the most widespread RDF Classes for indicating that a resource is a geographic entity.

An "annotation" is something different. The source document is trying to say something about the geographic entity. In this case, consensus seems to be building around the Open Annotation Consortium. That's a good thing on the "use existing work" principle. This time I'll start with a sentence: "Rome was the capital of the Roman Empire". Trivial, I know, but the point is to focus on the markup.

In RDF/Turtle, I want to say something like:

@prefix dcterms: <> .
@prefix geo: <> .
@prefix oac: <> .
@prefix rdfs: <> .

_:oacEx a oac:Annotation ;
oac:hasTarget <>;
oac:hasBody “was the capital of the Roman Empire” .

# choose one or both of dcterms:Location or geo:SpatialThing
<> a dcterms:Location, geo:SpatialThing ;
rdfs:label "Rome" .

The top level concept is oac:Annotation , a class that encapsulates the relationship between a body (the thing annotating) and a target (the thing annotated). This RDF/Turtle basically says "There's a location 'Rome' that 'was the capital of the Roman empire'. In RDFa, that's:
<?xml version="1.0" encoding="UTF-8" ?>
<html xmlns:dcterms=""
<span id="annotation1" typeof="oac:Annotation" about="#annotation1" >
<a rel="oac:hasTarget" href="">Rome</a> <span property="oac:hasBody">was the capital of the Roman Empire.</span>
<span style="display:none" about="" typeof="dcterms:Location geo:SpatialThing"></span>

This is a first crack at the RDFa so note the 'hidden' span that says the Pleiades URI is a dctermsLocation/geo:SpatialThing. I'm guessing I or somebody else can do better than that.

But the real point of this post is to propose that ladder of complexity. Use a combination of 'powder:describedby' along with dcterms:Location and/or geo:SpatialThing when that will suffice. Open Annotation is for more complex situations. Reactions?

Wednesday, March 2, 2011

Linking from Citation to Example in Numismatic (and other) Scholarship

I let myself follow a tangent today. It starts with noting that the article by C. Lorber and A. Meadows that I'm preparing for publication "Review of Ptolemaic Numismatics" makes frequent reference to coin types described in J. Svoronos, Ta nomismata tou kratous ton Ptolemaion. Athens, 1904-1908. It is an obvious feature of such a publication that those references lead readers to information about those coins.

To start on the journey towards such linking, I created URIs for all coin types defined in Svoronos' typology at See There's very little description there, and what is there is cribbed from C. Lorber's translation at

Now, if you go to this paragraph in Lorber and Meadows, which makes reference to Svoronos, you'll see that the link to "Sv. 1424" is live. Look towards the end of the paragraph. And note that it's possible to refer to single <p> elements in the article. That's because each one has an @id with a unique value. That's cool and important.

Follow the link to and you'll see further links to the ANS collection and to The former is a rock-solid stable URI but the coin hasn't been photographed (hint, hint). The latter is to an interesting project that is digitizing a type corpora for many series of coins. As the editor of ISAW Papers I don't have to worry if it's super-stable. I rely on to provide reasonable links and to keep them current.

The end result is a hint of a richly linked and illustrated future. Again, cool. I'd like to cross-the-bridge (as it were) and deliver images of Sv. 1424 while readers are still within the "environment" of Lorber and Meadows. But the first step is implementing such links, then we can work on the user experience.

In other news.... there is now a github for ISAW Papers at

Tuesday, March 1, 2011

"Archival" and "Presentation" versions of (x)html-based scholarship

Briefly...The presentation version changes the extension to ".html", adds some formatting to fix the page width and to justify the body paragraphs. It also adds an appendix of links to named entities at the end. That last suggests an interesting future.

The goal here is to maintain a focus on an archival version with very little formatting in it, while also exploring what the "nicer" presentation version can look like. Eventually this content will appear in a CMS-like environment. That should be attractive and functional so I'm figuring out what that means. In time, I'll add features along the lines of "pop-up" windows for geographic entities and the like. Not sure exactly what that entails but we'll find out as we go along.

And I'll move this to github in the near-ish future.

Wednesday, February 23, 2011

Test Bed for (X)HTML Conventions for Scholarly Publication

The main reason I joined the Institute for the Study of the Ancient World at NYU was to be part of initiating a program of digital publication of peer-reviewed scholarship. We haven't announced anything formally and this blog post isn't that announcement. It is the beginning of a nuts-and-bolts conversation about the markup of digital scholarship that is intended to encourage long-term viability, flexible re-use, and easy display (among many other things).

To get right down to business, is the very temporary URL for a preprint version of "Review of Ptolemaic Numismatics, 1996 to 2007" by Catherine Lorber and Andrew Meadows. I'm very grateful to Andy and Cathy for their willingness to be part of this experiment. Their work is largely done. Now it's up to me to make progress on the markup and I'm hoping to do that in a very public way.

But where to begin the conversation? I think the best approach is to admit I'm in the middle of things and just start laying out issues and thoughts. Keep in mind that everything is subject to change...
  • The format for ISAW digital publications is XHTML with RDFa. XHTML (for now 1.1 but moving to XHTML5) is a widely supported standard with excellent tooling that is directly viewable in many contexts. That makes it appropriate for long-term archival storage of born-digital scholarship.
  • Internal reference structures are important.For now this means each <p> element has an @id. div's of class 'section' also have @id attributes. This is in anticipation of using the semantic elements of HTML5.
  • Named entities will be tagged with links to stable resources describing those entities. For geography, Pleiades. For many other entities, Wikipedia. See below for RDFa patterns.
  • Existing ontologies/vocabularies will be used whenever possible. Geographic entities are typed as "dcterms:Location". That sort of thing.
  • Basic constructs for marking up bibliography and footnote-like structures are lacking for HTML-based markup languages. There are lots of semi-complete "best practices" but narrowing these down to a consistent and flexible convention will be an importnat process.

Looking ahead:
  • Multiple formats will be supported. We will distribute this text as "raw" valid xhtml. It will be hosted in a more interactive environment that does slick things like make maps, etc. Epub, pdf... all those are coming. Again, the ease with which a base XHTML representation can be converted to these other formats is one reason to use XHTML.
  • We'll use CC licenses Right now the document is CC-BY-NC-ND. We'll drop the ND eventually, perhaps the NC as well. The preprint is ND as a signal that a better version is coming from us.

A word on RDFa (a standardized way of embedding information in XHTML pages)...

The basic pattern that I'm using to markup named entities is illustrated by the sentence:
In a study of tax receipts from early Ptolemaic <a class="citation"
typeof="dcterms:Location" rel="iana:describedby"

That produces the RDF/Turtle
[ a dcterms:Location ;
rdfs:label "Thebes"@en ;
iana:describedby <>].
You can see the turtle for the whole document at

An "English" equivalent of the turtle snippet is 'There is a site in the text with label "Thebes" and a description at'

I like the use of the 'describedby' @rel value here. It's defined in the IANA's register of rel values ( I take the semantics to be "I'm not saying I'm linking to Thebes itself, only to a description of it." That seems nice and "semantic webby".

There's more to come but I'm getting this out there just to get the ball rolling...

Monday, February 21, 2011

Quick poll: Worldcat, Library of Congress, or Both

There are lots of ways of encoding bibliographic data on the web, but this post isn't about that problem. Instead, I'm wondering what is "the community's" preference between Worldcat and the Library of Congress when creating Semantic Web/Linked Open Data references.

As an example, the URIs and each lead to information about John Hayes' Late Roman Pottery published in 1972.

Which one of these is preferable as the long-term description of this volume? Worldcat or LOC. The use-case is a digital publication with bibliography that ideally includes a link to one or the other or both for all printed volumes or other appropriate entities.

Perhaps a discussion will ensue in the comments but here are some quick issues:
  • There are multiple URIs for that one volume in Worldcat. gets you to the Danish Union Catalog.
  • There are still concerns about the licensing of Worldcat data.
  • The LOC record is to a physical volume in a single national library and may not be intended as a description of the abstract concept (e.g. a FRBR Work). I don't know that Worldcat URIs solve this problem but they have the implication of a higher level of abstraction.

Votes and/or comments are appreciated.

Monday, February 7, 2011

Quick poll: Wikipedia or DBPedia?

I've created a poll near the upper right of this page. In longer form: when making persistent "Linked Data/Semantic Web" references to concepts described in Wikipedia, is it "best practice" to link to Wikipedia or to DBPedia? As in, "" or ""?

Friday, February 4, 2011

Access to Roman Art: Observations by Peter Stewart

The last few times I've gone to speak about issues of scholarly communication/digital humanities/digital archaeology/etc, I've opened up with a quote from Peter Stewart's 2008 book The Social History of Roman Art [Worldcat]. That's a great little book, and I was particularly pleased when reading it that Stewart is explicit about the effects of access to evidence and images on his selection and narrative. And I was further pleased that he talks about his personal efforts to solve those problems. I'll illustrate this by a series of passages given in their order of appearance:
Unfortunately, my comments in the Introduction about the problems of acquiring images were born out in the book's preparation, and I had very considerable difficulties and delays in acquiring most of the images reproduced here. I therefore owe a special debt to those who helped me to obtain pictures, and to those image-providers who waived or reduced reproduction fees. (p. xv)
Then from that introduction:

To an extent, however, these are all obvious problems of evidence and interpretation which are familiar in any branch of historical study. Other problems are insidious and lie unremarked in the methodological hinterland of books like this one. I have said that the use of examples must be highly selective. But behind any book on Roman art, there are processes of selection that are largely beyond the author’s control. Most Roman art historians will never, in their lifetime, see more than a tiny percentage even of the more significant works that survive. This is not simply because of the magnitude of this great body of material. It is also because most pieces are inaccessible. Many of the finest and most interesting Roman antiquities are in private collections, and many of these are unpublished, sometimes because of scholars’ anxieties about the legality of their origins. However works preserved in museums can be at least as difficult to access. Few museums are able to exhibit more than a small minority of the objects they hold. It is not infrequent (or surprising) for some of the objects in storage to be, effectively, lost, and for other reasons it may be hard for specialists to see material, particularly if it has been excavated recently. New discoveries may take many years to become familiar within the field, and even longer to filter into general, synoptic studies of Roman art.

So, for a variety of reason, authors depend heavily on other people's publications of Roman art, where they exist, and on their illustrations. The photographs themselves are usually supplied by the museums that own the work concerned, or simetimes by commercial agencies. In many cases no photograph exists, and new photography may not be permitted. In other cases, the acquisition of photographs proves lengthy or impossible. Moreover, the photographs (especially colour images) and the permission to reproduce them in print can be extremely costly both for individual authors and for their publishers. (p. 8)

The passages need to be read in context. It's not an angry book, and these introductory are comments are followed by interesting and challenging extended essay on the topic indicated by the title. I can highly recommend it. But back to the issue of access, here's a passage from the ending Bibliographical essay:
Finally, the photo-sharing website contains thousands of images relevant to Roman art, many of them with 'Creative Commons' copyright licenses that make them easy to use legitimately for, e.g. educational purposes. Within that site the 'Chiron' group especially is dedicated to making images available for classical teaching and research. This site carries many of my own photographs (under the screen name 'Tintern'), including colour images of the House of the Vettii and other sites mentioned in this book. (p. 174)
So mad props to Dr. Stewart for raising the issue of access and then doing something about it. A book from CUP in which the author cites his account? That's progress.

Monday, January 31, 2011

In-house commenting systems may not be necessary

Somewhat wishy-washy title, I know.

But here's my point, I look forward to a world of stable URIs for intellectual content in which responses to scholarship and primary data are distributed around the Net.

A case in point, my NYU colleague Chuck Jones blogged about the digitization of some of Blegen's diaries by the American School of Classical Studies.

If you look at the bottom of the post, you'll see that he included the Pleiades URI's for both Mycenae and Tiryns.

It is now the case that a Google search for the Tiryns URI lists Chuck's AWOL post.

Assuming that ASCSA doesn't move that resource to a different URI and that the post remains available, stable URIs for Tiryns and Mycenae have now been permanently associated with the ASCSA resource. And that with the publisher of the information doing nothing. (Though it would be nice if ASCSA ditched the "index.php" from their URIs. See here.)

And note that I'm walking a fine line in this post. The Pleiades URIs that Chuck included explicitly in his post don't appear in the text of mine. I don't see any reason to clog up the Google search with this meta-meta-commentary.

By way of slightly living up to the title, my point is that such a decentralized "commenting system" should be encouraged. If you're able to link from your content to a stable URI that more-or-less represents the same concept, do so. And use such URIs when you're talking about other's people's work. That will encourage a distributed network of publication and response that is robust, open and encompasses many forms of expression from tweets, to blogposts, to more formal work, and beyond.