Sunday, April 28, 2013

Last Post

I'm pretty sure this will be the last post to this blog. I'm not going to delete it but I think I should advertise the fact that I no longer use it to communicate my thoughts about digital humanities, Roman pottery, etc. Recently, I've been using Google+, Twitter, and (for better or for worse) Facebook for that purpose. My guess is I'll find reasons to piggy-back on other people's blog as a guest contributor so I thank you in advance should you ever give me that opportunity...

Wednesday, June 20, 2012

A Summer of Roman Pottery Pictures?

I've been posting a few pictures of Roman pottery (writ large) to my Twitter feed. My colleague at ISAW Irene Soto has posted a nice image of an amphora display case from the Crypta Balbi in Rome to my wall on FaceBook.

Perhaps more pottery-loving folk want to join in the fun? I prefer Twitter for this sort of sharing because it skips the FaceBook "confirm friend" step. Tweet and everyone can see it.

Then in the fall Irene and I will host an informal get together at ISAW along the lines of "What Roman Pottery I Photographed this Summer." With an emphasis on the "informal". As in, "host" may be a strong word and it's not something that we'd even call a "workshop", let alone "conference". We'll at least set up a projector in a room and share what we have. Obviously, not many people will be able to come but perhaps it will work out for at a few folk already in NYC.

How about a hastag of #rompot2012 for tweets? I'll start using that.

And I don't mean this to be a call for posting unpublished material. More along the lines of, "If a museum let's you take pictures, that may well mean they don't mind you tweeting." Make your own determination and join in if you can.

Finally, yes, this can work for other categories of ceramics, and for pieces not in museums, or other objects for that matter. Roman pottery is on my mind right now, but so are many other things.

Tuesday, May 29, 2012

A Quote from the NEH LAWDI Proposal


In February 2011, Tom Elliott, John Muccigrosso and I wrote and submitted a proposal to the NEH's "Institutes for Advanced Topics in the Digital Humanities" program. It was accepted, which is way cool. So on Thursday - the day after tomorrow (yikes!) - we begin the "Linked Ancient World Data Institute" (#LAWDI).

We definitely wrote about RDF, RDFa, SPARQL, etc. But to give you a flavor of what we're trying to achieve, here's a quote from the narrative:
Finally, we stress that it is not our intent to ask LAWDI participants to adhere to a single standard that dictates how each project and discipline brings its intellectual content into digital form. We recognize that existing data is heterogeneous and that many digital humanities projects have invested substantial time and money in creating resources according to their own needs. In this environment, any attempt to create a single unifying standard of data representation will fail and so we have not adopted that language in this proposal. We are also sensitive to the principle that overly detailed standards presume that a discipline knows all that it wants to say about its topic of study. This is certainly not the case for the Ancient World, where the basic terms of analysis continue to change in exciting ways. Of course, recognizing complexity as the starting point of discussions does not mean that useful interoperability cannot be achieved.
I think the above makes a lot of sense at this stage in digitizing the Ancient World. I'd also reformulate the sentiment as "Sure it's your data, but put it on the public internet in such a way that others can make use of it with existing tools and best practices."

Saturday, May 19, 2012

Test of awld.js

I am trying to install awld.js on this blog. If it worked this link to Wikipedia will have a popup if you hover over it: http://en.wikipedia.org/wiki/African_red_slip.

Here's a link to Pleiades: http://pleiades.stoa.org/places/648772

I may delete this post after publication.

...

It seems to be working but I need to do some work to isolate the awld.js css from the blogger css. That counts as a detail.

Still playing... Corinth.

Saturday, December 31, 2011

Toying with 'Knowledge Representation and Reasoning' for the Ancient World

This is a very (very!) rough opening entry in a discussion I hope to push forward in 2012. But first some preliminaries.
  • I don't know a lot about "Knowledge Representation and Reasoning" but I do know more than I did 48 hours ago. I'm in the world of "Semantic Reasoning" and "OWL 2 Ontologies". That's an interesting, and often very technical, place to be. But fun, all the same.
  • That's why I put "Toying" in the title of this post. I'm really just playing around here and figure I won't find out what I'm doing wrong if I don't share thoughts sooner rather than later.
I've opened a github repository at https://github.com/sfsheath/awo so I'll just dive right in using the mini-ontology that I started there. 'awo' stands for 'Ancient World Ontology' and, again, that's what I'm thinking about. 

The file 'awo.owl' defines, among other things, two people: 'Augustus' and 'Lucius Cornelius Sulla'. This is an opportunity to note that the authority file I'm using for names (of people or other entities) is Wikipedia. I don't know of another publicly accessible resource with such extensive coverage combined with a simple mechanism for creating new identities. As it stands now (this github commit), awo.owl says the following about Augustus and Sulla:

 <owl:Thing rdf:about="#Augustus">
    <rdfs:label>Augustus</rdfs:label>
    <rdf:type rdf:resource="http://schema.org/Person" />
    <is rdf:resource="#Roman_Emperor" />
    <is rdf:resource="#Pontifex_Maximus" />
    <is rdf:resource="#Tribune" />
    <owl:sameAs rdf:resource="http://dbpedia.org/page/Augustus" />
    <owl:sameAs rdf:resource="http://viaf.org/viaf/18013086" />
  </owl:Thing>

  <owl:Thing rdf:about="#Lucius_Cornelius_Sulla">
    <rdfs:label>Lucius Cornelius Sulla</rdfs:label>
    <rdf:type rdf:resource="http://schema.org/Person" />
    <is rdf:resource="#Roman_Dictator" />
    <owl:sameAs rdf:resource="http://dbpedia.org/page/Lucius_Cornelius_Sulla" />
  </owl:Thing>

I hope some of the 'meaning' of this markup is accessible even without 'knowing' OWL. I'm asserting that there are entities (owl:Thing's) "Augustus" and "Lucius_Cornelius_Sulla". It connects those to other defined entities such as "Roman_Emperor" and "Roman_Dictator". Again, those names are taken from Wikipedia.

I know some people won't like the use of "owl:sameAs", but I think it conforms closely to the definition of that term in the OWL 2 documentation. And what about  the "is" property. Here I did become concerned that none of the OWL 2 terms for indicating equivalence between "owl:Thing"'s worked. So I made up the generic "is" property to match the very generic and informal semantics of the fairly reasonable statement "Augustus is a Roman Emperor". I could have used "was" but that seemed silly.

But what about reasoning? The repository also has a file "awo-reasoned.rdf'. That has the following (slightly re-ordered and abridged) statements about both Augustus and Sulla:

  <rdf:Description rdf:about="http://example.org/awo#Lucius_Cornelius_Sulla">
    <rdf:type rdf:resource="http://schema.org/Person"/>
    <j.1:is rdf:resource="http://dbpedia.org/page/Roman_Dictator"/>
    <j.1:is rdf:resource="http://example.org/awo#Roman_Dictator"/>

    <owl:sameAs rdf:resource="http://dbpedia.org/page/Lucius_Cornelius_Sulla"/>
    <owl:sameAs rdf:resource="http://example.org/awo#Lucius_Cornelius_Sulla"/>

    <rdf:type rdf:resource="http://example.org/awo#Office_Holder"/>
    <rdf:type rdf:resource="http://example.org/awo#Roman_Office_Holder"/>
    <rdf:type rdf:resource="http://example.org/awo#Roman_Republican_Office_Holder"/>

    <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
  </rdf:Description>

  <rdf:Description rdf:about="http://dbpedia.org/page/Augustus">
    <rdf:type rdf:resource="http://schema.org/Person"/>
    <j.1:is rdf:resource="http://example.org/awo#Roman_Emperor"/>
    <j.1:is rdf:resource="http://dbpedia.org/page/Roman_Emperor"/>
    <j.1:is rdf:resource="http://example.org/awo#Pontifex_Maximus"/>
    <j.1:is rdf:resource="http://dbpedia.org/page/Pontifex_Maximus"/>
    <j.1:is rdf:resource="http://example.org/awo#Tribune"/>

    <owl:sameAs rdf:resource="http://viaf.org/viaf/18013086"/>
    <owl:sameAs rdf:resource="http://dbpedia.org/page/Augustus"/>
    <owl:sameAs rdf:resource="http://example.org/awo#Augustus"/>

    <rdf:type rdf:resource="http://example.org/awo#Office_Holder"/>
    <rdf:type rdf:resource="http://example.org/awo#Roman_Office_Holder"/>
    <rdf:type rdf:resource="http://example.org/awo#Roman_Religious_Office_Holder"/>
    <rdf:type rdf:resource="http://example.org/awo#Roman_Imperial_Office_Holder"/>

    <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
  </rdf:Description>

This file is generated by the command-line tool in the open source OWL-DL reasoner Pellet. Another win for open source as far as I'm concerned.

To the extent that the mini awo ontology hints at a useful future, it's because both Sulla and Augustus are 'known' to be "#Roman_Office_Holder"s. The ontology defines the owl:Class "Roman_Republican_Office_Holder" as all owl:Things said to be "Roman_Dictator"s. "Roman_Imperial_Office_Holder" is defined as all owl:Things said to be "Roman_Emperor"'s. Both of these classes are sub-classes of "Roman_Office_Holder".

Looking ahead, this simple (simplistic?) demonstration suggests a world in which it is possible to search a corpus of information - be it primary texts or secondary scholarship - for references to "Roman Office Holders" and be shown all documents (or other resources) that reference either Augustus or Sulla. That would be cool.

If you dig into awo-reasoned.rdf, you'll see that everything it says about "Augustus" it also says about the URI http://viaf.org/viaf/18013086. VIAF is the "Virtual International Authority File". Here I'm trying to (again, simply) explore the idea that if an author were to link to that well-known URI published by VIAF, then it would be discoverable that the document making the link referred to not only the Emperor but also to the more generic concept "Roman_Office_Holder". So imagine an Internet that can be queried for "All references to Roman office holders".

And we do want to support more complex queries: "All late Roman military sites in Syria within 30 kilometers of the findspots of LR coins or LR African Red-Slip". We're a long way from that but it's doable on the basis of existing technologies. And the content to support such queries is slowly coming online.

Some other bullet points in this world:
  • It's a world that I can think about because of conversations I've been having with my colleagues at ISAW, with the people running Pelagios, with people I've been writing papers and grants with. And other. There's nothing exceptionally original here. But the next step is to be part of "just doing it."
  • There needs to be a mechanism for bringing together existing RDF-based resources into a big pile of triples from which  a reasoner can extract interesting relationships. The work can't be done by hand by a few individuals. But if we just let the machines run wild, we'll end up with silly conclusions. We need to find the right balance of automatic processing and community sourcing to create an "Ancient World Inference Engine" or "Ancient World Semantic Reasoner" that is actually useful.
  • And that's probably an important principle: make it useful. Here are some thoughts on that:
    • When a "third party" resource links to a URI such as "http://en.wikipedia.org/wiki/Augustus" (or its VIAF equivalent), it would be nice if there were a javascript library that showed a menu offering links based on a JSON serialization of the 'knowledge' in awo-reasoned.rdf. This is an idea that has been floating around and whose time has come.
    • The network of links to stable URIs should be harvested so that the reasoner can work across the entire Ancient World Internet. The internet is the interface that allows community sourcing.
    • Existing resources that provide stability - such as Perseus, PAS, Pleiades, DBPedia, OpenContext, Nomisma.org, and many others -  should be incorporated. Keep new work to a minimum.
    • Another way of saying the above is that an "Ancient World Triple Store and Reasoner" should look to be a "pass through" resource reflecting the existing and developing state of the Internet rather than a destination itself.
    • The whole big pile of reasoned triples should be downloadable so that others can pay for the cycles to query it when they're doing something really complex. CC everything!
The above has started to wander a little bit so I'll end this post here. Let's see what happens in the next year or so...


Wednesday, September 14, 2011

ISAW Roman Pottery Reading Group: September 22

The 2011/2012 kick-off meeting of the ISAW Roman Pottery Reading Group is next Thursday, September 22 at 3:30. The topic is roughly "African pottery in the Eastern Mediterranean in Late Antiquity". As always, the readings don't cover the full range of what we could talk about:

  • Abadie-Reynal, C. 1989. “Céramique et commerce dans le bassin Égéen du IVe au VIIe siècle,” in V. Kravari, J. Lefort and C. Morrisson (edd.), Hommes et richesses dans l’Empire byzantin I. IVe-VIe siècle (Paris) 143-159.
  • Bonifay, M. 2005. “Observations sur la diffusion des céramiques africaines en Méditerrannée orientale durant l’Antiquité tardive,” in F. Baratte et al. (edd.), Mélanges Jean-Pierre Sodini (Travaux et Mémoires 15), 565-81.
  • Majcherek, G. 2004.  ‘‘Alexandria’s long-distance trade in Late Antiquity – the amphora evidence’’, in  ed. Jonas Eiring and John Lund (edd.), Transport Amphorae and Trade in the Eastern Mediterranean. Acts of the International Colloquium at the Danish Institute at Athens, September 26–29, 2002, 229-237.
  • Bes, P.M. and J. Poblome. 2009. "African Red Slip Ware on the Move: the Effects of Bonifay’s Etudes for the Roman East," in: J.H. Humphrey (ed.): Studies on Roman Pottery from Africa Proconsularis and Byzacena (Tunisia). Hommage à Michel Bonifay (Journal of Roman Archaeology Supplementary Volume 76), 73-91. [An incomplete version of this available at: https://lirias.kuleuven.be/bitstream/123456789/243673/2/Poblomeforpdf.pdf]
The Abadie-Reynal is a classic and always worth looking at. It's important to take account of Bonifay's work so the Bes and Poblome article does that. The Majcherek gives a site specific view on the question, while also addressing large-scale historical issues. Should be fun.

Wednesday, June 22, 2011

Blogging my Digital Humanities 2011 Talk

I was all prepped to give a nice conversational version of my paper at Digital Humanties 2001 when my plane was delayed, so I had to spend an extra night in Boston, meaning my arrival in Palo Alto was bumped to after my allotted time. Oh, well. Here's a summary that presents some of what I was going to say.


The title was The Digital Materiality of Early Christian Visual Culture: Building on John 20:24-29 and the abstract is here.

My first "real" slide was a long-ish quote from the article Leonardi, P. 2010. "Digital materiality? How artifacts without matter, matter" in the online journal First Monday. It's online at http://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/issue/view/315. Here's the quote:


I argue that treating materiality as the practical instantiation of theoretical ideas (like policies that allow women to vote help make material the idea that sexes are equal) or as what is significant in the explanation of a given context (like material evidence in a courtroom trial) provides a more useful framework for understanding how digital artifacts affect the process of organizing. I contend that moving away from linking materiality to notions of physical substance or matter may help scholars of technology integrate their work more centrally with studies of discourse, routine, institutions and other phenomena that lie at the core of organization theory, specifically, and social theory more broadly.
I've highlighted the bits I was going to focus on. "[M]ateriality as the practical instantiation of theoretical ideas" has useful overlap with how archaeologists think materiality. We often try to "back port" from the objects we find to what people were thinking, but Leonardi's explicit connection between ideas/thought and material is enough to prime the pump within the context of a 20 minute paper.


The "moving away" idea gave me something to play off of. It's not that I disagree with Leonardi, it's that I like to think about the continuum of interplay between thought and matter that is enabled by digital surrogates of material culture. Here are two snippets from what I said:
Just as the creation of the surviving material record should be recognized as the cumulative action of many individuals, it is likely that exploration of that record will be enabled by many projects and institutions working within their own areas of expertise and with content specific to their domain (Heath 2010, Terras 2010). It is the interactions of a series of self-digitizing and independent communities – here Early Christian textual studies and Numismatics – that can recover relationships between physical object and human thought that is a primary goal of materiality as a methodological approach .... Digital materiality is therefore an act of transmission (Liu 2004) so that its deficiencies leave it open to criticism.
I'm being selective in quoting myself so you may want to read the above in context. It has a slightly different twist there.


More briefly, my point is that transmission of digital surrogates for material culture will provide opportunities for a new/re-emphasis on the relation between object and thought - that is, "materiality" - in the ancient world and in the study of the ancient world. It will probably do so elsewhere but I'm an Ancient Med. person so that's where I focus. If we think of digitization as de-materialization, it will enable new appreciation of the material.

But what leads me to say that?


My specific example is the relationship between the text of John 20:19-29 and physical manifestations of it. That's basically the story of "Doubting Thomas", a phrase meaning someone who requires physical, unambiguous proof before believing something.


Those verses from the the Gospel of John chapter 20 are readily accessible online. Here's the KJV version. Or the New International Version. Do you read Macedonian? Go for it. Maybe you prefer Arabic?


The story culminates in Jesus saying, "Because you have seen me, you have believed; blessed are those who have not seen and yet have believed." You can get commentary on that phrase here (scroll to the bottom).


I threw in those links to show that the accessibility of the text doesn't come from the academy, that is, from the traditional home of Digital Humanities. Sure, the New Testament is both studied within universities and is available through DH stalwarts such as Perseus. But much of the digital action around this text comes out of the self-digitizing community that is the Christian web. I think that's cool and something we should be paying attention to. I discussed that more in the blog post "Digital Epistemology as Mediated through Tessellated Self-Digitizing Communities"over at Posterous.


But these links are not materiality. They're virtual all the way.


It's easy to materialize the text of 20 John:19-24 in a modern context. Here's a low-res image of the text (minus the beginning of verse 19) from the United Bible Societies' Greek New Testament GNT.




If we look more closely at verse 21 -  which in translation is "Again Jesus said, 'Peace be with you! As the Father has sent me, I am sending you.'" - we see:








The top image is the text in the GNT and the bottom is the critical apparatus or app. crit. If you look after the // in the ap. crit., you see the Greek word παλιν followed by the Hebrew letter aleph. That's the symbol for a 4th century manuscript known as the Codex Sinaiticus. I've linked to the Wikipedia article for the codex but it's important for my talk that a digital facsimile of the manuscript is available at http://codexsinaiticus.org/. Go take a look, it's a cool site.


And just by way of introduction, the Codex Sinaiticus is a fourth century manuscript that is one of our earliest complete versions of the New Testament. Much of it is now in the British Museum but came there indirectly from Saint Catharine's monastery on the Sinai Peninsula.


Here's a screen shot of the site's version of John 20:21.




Counting down seven lines from the top of the middle column brings you to the greek "εἶπεν οὖν αὐτοῖς πάλιν" or "He said to them again...". Note that in the GNT version the text is "εἶπεν οὖν αὐτοῖς [ὁ Ἰησοῦς] πάλιν" or "Jesus said to them again..." . The brackets around "ὁ Ἰησοῦς"are an indication of some uncertainty about the reading of the "original" text. The Codex Sinaiticus delivers one component of the material basis of that uncertainty. That's digital materiality.

Here's another CS screenshot:



You can see the extent of correction as a second scribe addressed both basic mistakes and subtle issues of reading in the original product.

We can expand the digital materiality of this text by linking to a gold roundel - "small round disk" - in the collection of the American Numismatic Society: http://numismatics.org/collection/0000.999.51006 . Here's the screen shot of that page:


To the left of the central Jesus you see Thomas reaching out to touch Jesus' wounds. There's also a slightly irregular transcription of the Greek for Thomas' declaration "My Lord, My God" and of Jesus response to the effect that those who have not required such proof are blessed.

Without going into too much detail, disks of this sort are believed to have been produced in Egypt. This piece doesn't come with a findspot but it is reasonable to invoke it next to the Codex Sinaiticus.

It is another materialization of John 19-29. One that combines text and image. It is a projection into physical space and across time of the message that Christian believers who did not have the opportunity to see Christ can be blessed.

These two objects - the Codex and the ANS disk - show that materiality does not remove the reader or viewer from our understanding of how texts worked. These materializations remind us that texts are physical objects that are responded to by people, and that one response is to change the materiality, as in editing the Codex. Certainly, one response is to debate the meaning of a text. The story of the Doubting Thomas is understood by many modern scholars as a statement against those who denied the humanity of Christ, among whom were the Gnostics, active in Egypt. The ANS disk is therefore part of an ongoing debate about Christ's nature.

The point of this post is not to go into great depth about what the pairing of these objects tells us about the role of materiality in the Late Roman/Byzantine Egypt. Instead, I want to stress that the opportunity to think about that issue with such relative ease arises from acts of independent self-digitization that exist within wider contexts of topically related efforts also engaging in self-digitization. That leads to an environment in which intellectual risk taking is rewarded.

I don't want the series of inferences above to be pigeon-holed into either saying something about the past or about the present. I think we're at a stage of Digital Humanities where we can recognize that we are doing both. We do not know what questions about the past that modern Digital Materiality will allow us to ask, but I bet we're about to find out.

Thursday, March 31, 2011

Brief Anecdote about Discoverability, Sigma Tables, and the Athenian Agora

In the middle of today's meeting of the ISAW Roman Pottery Reading Group, the issue of "sigma tables" came up. These semi-circular marble tables are invoked by both of today's authors so it was natural to pause on the topic. At which I point I mentioned, "there's one in the Agora and I bet it's online." Quick Google search on "marble sigma table agora" and we were a click away from the Agora's database.


That's object A 3869.

My only point is that because the object was easy to find via the public Internet, we were able to include it in our conversation. It was very useful to compare a specimen to Hudson's and Vroom's analysis and to the additional visual evidence they each gather.

As a reminder, here's what we read:
  • Nicholas Hudson. 2010. "Changing Places: The Archaeology of the Roman Convivium." AJA 114.4: 663-695.
  • Joanita Vroom. 2008. ‘The archaeology of late antique dining habits in the eastern Mediterranean: A preliminary study of the evidence’, in: L. Lavan, E. Swift and T. Putzeys (eds.), Objects in Context, Objects in Use. Material Spatiality in Late Antiquity (Late Antique Archaeology 5), Leiden and Boston: 313-361.

Friday, March 25, 2011

Cool pics of a Roman Hoard

I'm slow to geting round to this, but really, do visit http://ilfattostorico.com/2010/07/08/scoperte-piu-di-52000-monete-romane/. The pictures of the coins are cool. Even cooler are the pictures of the large vessel they were buried in. Those of us in numismatics frequently see the dry phrase, "Found in pot", or the more concise term, "Pot hoard". This page will help you visualize what that really means.

Here's a sample:





Monday, March 21, 2011

Roman Pottery Reading Group at the Institute for the Study of the Ancient World

A few of my ISAW/NYU colleagues and I have begun a "Roman Pottery Reading Group," which seems to be settling into a sort of every-other-week-ish-y schedule.

We began with three "Romanization" articles:
  • D. Malfitana, J. Poblome and J. Lund. 2005. "Late Hellenistic imports of eastern sigillata A in Italy. A socio-economic perspective," Babesch 80: 199-212.
  • Poblome, Jeroen and Michael Zelle. 2002. “The table ware boom: a socio-economic perspective from western Asia Minor” in Christof Berns, Henner von Hesberg, Lutgarde Vendeput and Marc Waelkens (eds.), Patris und Imperium, Leuven: 275-287.
  • Rotroff, S. 1997. "From Greek to Roman in Athenian Ceramics," in M.C. Hoff and S.I. Rotrof ( eds.), The Romanization of Athens, , Oxford: 97-116.
It was an added bonus that my colleague Billur Tekkök, in the States on a Fulbright Fellowship, could join us for that first session.

Next we read:
The point here was to look at representative samples of 40 years of publication from one site. Put simply: what has changed in techniques and approaches over that time? A little "inside baseball" but a fun conversation.

Next up is ceramics and dining:
  • Nicholas Hudson. 2010. "Changing Places: The Archaeology of the Roman Convivium." AJA 114.4: 663-695.
  • Joanita Vroom. 2008. ‘The archaeology of late antique dining habits in the eastern Mediterranean: A preliminary study of the evidence’, in: L. Lavan, E. Swift and T. Putzeys (eds.), Objects in Context, Objects in Use. Material Spatiality in Late Antiquity (Late Antique Archaeology 5), Leiden and Boston: 313-361.

We're meeting Thursday, March 31 at 3:00 PM Eastern Daylight Time. It's tempting to see if anybody wants to join us virtually. If really, truly, "yes", I'll see what we can do.

Friday, March 4, 2011

LRC/Phocaean Red Slip at Alexandria Troas

Anybody who would enjoy seeing a nice color picture of LRC/Phocaean Red-Slip rim sherds should take a look at figure 23 on page 15 of Stefan Feuser's article "The Roman Harbour of Alexandria Troas, Turkey" in volume 40.1 (2010) of The International Journal of Nautical Archaeology, doi:j.1095-9270.2010.00294.x.

From Typed Links to Annotations in Ancient Geography

I've been participating in the discusions of the Pelagios Project's plans to establish semantic web/linked data conventions for linking geographic information in the ancient world. Nomisma.org is listed as a partner and it's a good group of people who are coming together to think about the issue.

As always, the individuals and projects involved don't want to re-invent the wheel. And, also as always, some new work - even if it's just establishing a domain-specific use for existing standards - is necessary. That last is what I'm thinking about right now.

I mean the title of this post to establish an axis of complexity when it comes to relating a web-based resource to a geographic entity. A "typed link" is basically plain-old HTML with a little bit of RDF-sugar to say that the end-point is a geographic entity. I've already spoken about doing this in earlier posts. Here, let me start with the RDF/Turtle:

@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix powder: <http://www.w3.org/2007/05/powder#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

[] a dcterms:Location,geo:SpatialThing;
powder:describedby <http://pleiades.stoa.org/places/423025>;
rdfs:label "Rome" .
This is the RDFa:

<a href="http://pleiades.stoa.org/places/423025" typeof="dcterms:Location geo:SpatialThing" rel="powder:describedby" property="rdfs:label">Rome</a>

Again, that's pretty simple html that adds a little in-place information that the link is to a geographic entity that is defined at a particular URL. There are many tools that can parse that link and do interesting things like show a map. Hence the term I'm using here, "typed link". And I include as an "interesting thing" the now prosaic ability of a user to click on that link when it's rendered by a browser. Human readable and machine actionable. Win, win.

To be clear, with this post I am suggesting to my Pelagios colleagues that we use this or a similarly "light-weight" convention for the simple case of a link to a geographic entity. And yes, I don't mind if you use dcterms:Location, geo:SpatialThing or both. Those are the most widespread RDF Classes for indicating that a resource is a geographic entity.

An "annotation" is something different. The source document is trying to say something about the geographic entity. In this case, consensus seems to be building around the Open Annotation Consortium. That's a good thing on the "use existing work" principle. This time I'll start with a sentence: "Rome was the capital of the Roman Empire". Trivial, I know, but the point is to focus on the markup.

In RDF/Turtle, I want to say something like:

@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix oac: <http://www.openannotation.org/ns/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

_:oacEx a oac:Annotation ;
oac:hasTarget <http://pleiades.stoa.org/places/423025>;
oac:hasBody “was the capital of the Roman Empire” .

# choose one or both of dcterms:Location or geo:SpatialThing
<http://pleiades.stoa.org/places/423025> a dcterms:Location, geo:SpatialThing ;
rdfs:label "Rome" .


The top level concept is oac:Annotation , a class that encapsulates the relationship between a body (the thing annotating) and a target (the thing annotated). This RDF/Turtle basically says "There's a location 'Rome' that 'was the capital of the Roman empire'. In RDFa, that's:
<?xml version="1.0" encoding="UTF-8" ?>
<html xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
xmlns:oac="http://www.openannotation.org/ns/"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
base="http://example.org/doc-1.html"
>
<head></head>
<body>
<span id="annotation1" typeof="oac:Annotation" about="#annotation1" >
<a rel="oac:hasTarget" href="http://pleiades.stoa.org/places/423025">Rome</a> <span property="oac:hasBody">was the capital of the Roman Empire.</span>
</span>
<span style="display:none" about="http://pleiades.stoa.org/places/423025" typeof="dcterms:Location geo:SpatialThing"></span>
</body>
</html>

This is a first crack at the RDFa so note the 'hidden' span that says the Pleiades URI is a dctermsLocation/geo:SpatialThing. I'm guessing I or somebody else can do better than that.

But the real point of this post is to propose that ladder of complexity. Use a combination of 'powder:describedby' along with dcterms:Location and/or geo:SpatialThing when that will suffice. Open Annotation is for more complex situations. Reactions?

Wednesday, March 2, 2011

Linking from Citation to Example in Numismatic (and other) Scholarship

I let myself follow a tangent today. It starts with noting that the article by C. Lorber and A. Meadows that I'm preparing for publication "Review of Ptolemaic Numismatics" makes frequent reference to coin types described in J. Svoronos, Ta nomismata tou kratous ton Ptolemaion. Athens, 1904-1908. It is an obvious feature of such a publication that those references lead readers to information about those coins.

To start on the journey towards such linking, I created URIs for all coin types defined in Svoronos' typology at nomisma.org. See http://nomisma.org/id/svoronos-1904-1000. There's very little description there, and what is there is cribbed from C. Lorber's translation at http://www.coin.com/images/dr/svoronos_text.html.

Now, if you go to this paragraph in Lorber and Meadows, which makes reference to Svoronos, you'll see that the link to "Sv. 1424" is live. Look towards the end of the paragraph. And note that it's possible to refer to single <p> elements in the article. That's because each one has an @id with a unique value. That's cool and important.

Follow the link to http://nomisma.org/id/svoronos-1904-1424 and you'll see further links to the ANS collection and to coinproject.com. The former is a rock-solid stable URI but the coin hasn't been photographed (hint, hint). The latter is to an interesting project that is digitizing a type corpora for many series of coins. As the editor of ISAW Papers I don't have to worry if it's super-stable. I rely on nomisma.org to provide reasonable links and to keep them current.

The end result is a hint of a richly linked and illustrated future. Again, cool. I'd like to cross-the-bridge (as it were) and deliver images of Sv. 1424 while readers are still within the "environment" of Lorber and Meadows. But the first step is implementing such links, then we can work on the user experience.

In other news.... there is now a github for ISAW Papers at http://github.com/sfsheath/isaw-papers.

Tuesday, March 1, 2011

"Archival" and "Presentation" versions of (x)html-based scholarship

Briefly...The presentation version changes the extension to ".html", adds some formatting to fix the page width and to justify the body paragraphs. It also adds an appendix of links to named entities at the end. That last suggests an interesting future.

The goal here is to maintain a focus on an archival version with very little formatting in it, while also exploring what the "nicer" presentation version can look like. Eventually this content will appear in a CMS-like environment. That should be attractive and functional so I'm figuring out what that means. In time, I'll add features along the lines of "pop-up" windows for geographic entities and the like. Not sure exactly what that entails but we'll find out as we go along.

And I'll move this to github in the near-ish future.

Wednesday, February 23, 2011

Test Bed for (X)HTML Conventions for Scholarly Publication

The main reason I joined the Institute for the Study of the Ancient World at NYU was to be part of initiating a program of digital publication of peer-reviewed scholarship. We haven't announced anything formally and this blog post isn't that announcement. It is the beginning of a nuts-and-bolts conversation about the markup of digital scholarship that is intended to encourage long-term viability, flexible re-use, and easy display (among many other things).

To get right down to business, http://dl.dropbox.com/u/17002562/isaw-papers-preprint.xhtml is the very temporary URL for a preprint version of "Review of Ptolemaic Numismatics, 1996 to 2007" by Catherine Lorber and Andrew Meadows. I'm very grateful to Andy and Cathy for their willingness to be part of this experiment. Their work is largely done. Now it's up to me to make progress on the markup and I'm hoping to do that in a very public way.

But where to begin the conversation? I think the best approach is to admit I'm in the middle of things and just start laying out issues and thoughts. Keep in mind that everything is subject to change...
  • The format for ISAW digital publications is XHTML with RDFa. XHTML (for now 1.1 but moving to XHTML5) is a widely supported standard with excellent tooling that is directly viewable in many contexts. That makes it appropriate for long-term archival storage of born-digital scholarship.
  • Internal reference structures are important.For now this means each <p> element has an @id. div's of class 'section' also have @id attributes. This is in anticipation of using the semantic elements of HTML5.
  • Named entities will be tagged with links to stable resources describing those entities. For geography, Pleiades. For many other entities, Wikipedia. See below for RDFa patterns.
  • Existing ontologies/vocabularies will be used whenever possible. Geographic entities are typed as "dcterms:Location". That sort of thing.
  • Basic constructs for marking up bibliography and footnote-like structures are lacking for HTML-based markup languages. There are lots of semi-complete "best practices" but narrowing these down to a consistent and flexible convention will be an importnat process.

Looking ahead:
  • Multiple formats will be supported. We will distribute this text as "raw" valid xhtml. It will be hosted in a more interactive environment that does slick things like make maps, etc. Epub, pdf... all those are coming. Again, the ease with which a base XHTML representation can be converted to these other formats is one reason to use XHTML.
  • We'll use CC licenses Right now the document is CC-BY-NC-ND. We'll drop the ND eventually, perhaps the NC as well. The preprint is ND as a signal that a better version is coming from us.


A word on RDFa (a standardized way of embedding information in XHTML pages)...

The basic pattern that I'm using to markup named entities is illustrated by the sentence:
In a study of tax receipts from early Ptolemaic <a class="citation"
href="http://pleiades.stoa.org/places/991398"
typeof="dcterms:Location" rel="iana:describedby"
property="rdfs:label">Thebes</a>...


That produces the RDF/Turtle
[ a dcterms:Location ;
rdfs:label "Thebes"@en ;
iana:describedby <http://pleiades.stoa.org/places/991398>].
You can see the turtle for the whole document at http://bit.ly/hJjgcx.

An "English" equivalent of the turtle snippet is 'There is a site in the text with label "Thebes" and a description at http://pleiades.stoa.org/places/991398.'

I like the use of the 'describedby' @rel value here. It's defined in the IANA's register of rel values (http://www.iana.org/assignments/link-relations/link-relations.xml). I take the semantics to be "I'm not saying I'm linking to Thebes itself, only to a description of it." That seems nice and "semantic webby".

There's more to come but I'm getting this out there just to get the ball rolling...

Monday, February 21, 2011

Quick poll: Worldcat, Library of Congress, or Both

There are lots of ways of encoding bibliographic data on the web, but this post isn't about that problem. Instead, I'm wondering what is "the community's" preference between Worldcat and the Library of Congress when creating Semantic Web/Linked Open Data references.

As an example, the URIs http://www.worldcat.org/oclc/829279 and http://lccn.loc.gov/74155758 each lead to information about John Hayes' Late Roman Pottery published in 1972.

Which one of these is preferable as the long-term description of this volume? Worldcat or LOC. The use-case is a digital publication with bibliography that ideally includes a link to one or the other or both for all printed volumes or other appropriate entities.

Perhaps a discussion will ensue in the comments but here are some quick issues:
  • There are multiple URIs for that one volume in Worldcat. http://www.worldcat.org/oclc/462730938 gets you to the Danish Union Catalog.
  • There are still concerns about the licensing of Worldcat data.
  • The LOC record is to a physical volume in a single national library and may not be intended as a description of the abstract concept (e.g. a FRBR Work). I don't know that Worldcat URIs solve this problem but they have the implication of a higher level of abstraction.


Votes and/or comments are appreciated.

Monday, February 7, 2011

Quick poll: Wikipedia or DBPedia?

I've created a poll near the upper right of this page. In longer form: when making persistent "Linked Data/Semantic Web" references to concepts described in Wikipedia, is it "best practice" to link to Wikipedia or to DBPedia? As in, "http://en.wikipedia.org/wiki/Augustus" or "http://dbpedia.org/resource/Augustus"?

Friday, February 4, 2011

Access to Roman Art: Observations by Peter Stewart

The last few times I've gone to speak about issues of scholarly communication/digital humanities/digital archaeology/etc, I've opened up with a quote from Peter Stewart's 2008 book The Social History of Roman Art [Worldcat]. That's a great little book, and I was particularly pleased when reading it that Stewart is explicit about the effects of access to evidence and images on his selection and narrative. And I was further pleased that he talks about his personal efforts to solve those problems. I'll illustrate this by a series of passages given in their order of appearance:
Unfortunately, my comments in the Introduction about the problems of acquiring images were born out in the book's preparation, and I had very considerable difficulties and delays in acquiring most of the images reproduced here. I therefore owe a special debt to those who helped me to obtain pictures, and to those image-providers who waived or reduced reproduction fees. (p. xv)
Then from that introduction:

To an extent, however, these are all obvious problems of evidence and interpretation which are familiar in any branch of historical study. Other problems are insidious and lie unremarked in the methodological hinterland of books like this one. I have said that the use of examples must be highly selective. But behind any book on Roman art, there are processes of selection that are largely beyond the author’s control. Most Roman art historians will never, in their lifetime, see more than a tiny percentage even of the more significant works that survive. This is not simply because of the magnitude of this great body of material. It is also because most pieces are inaccessible. Many of the finest and most interesting Roman antiquities are in private collections, and many of these are unpublished, sometimes because of scholars’ anxieties about the legality of their origins. However works preserved in museums can be at least as difficult to access. Few museums are able to exhibit more than a small minority of the objects they hold. It is not infrequent (or surprising) for some of the objects in storage to be, effectively, lost, and for other reasons it may be hard for specialists to see material, particularly if it has been excavated recently. New discoveries may take many years to become familiar within the field, and even longer to filter into general, synoptic studies of Roman art.

So, for a variety of reason, authors depend heavily on other people's publications of Roman art, where they exist, and on their illustrations. The photographs themselves are usually supplied by the museums that own the work concerned, or simetimes by commercial agencies. In many cases no photograph exists, and new photography may not be permitted. In other cases, the acquisition of photographs proves lengthy or impossible. Moreover, the photographs (especially colour images) and the permission to reproduce them in print can be extremely costly both for individual authors and for their publishers. (p. 8)

The passages need to be read in context. It's not an angry book, and these introductory are comments are followed by interesting and challenging extended essay on the topic indicated by the title. I can highly recommend it. But back to the issue of access, here's a passage from the ending Bibliographical essay:
Finally, the photo-sharing website flickr.com contains thousands of images relevant to Roman art, many of them with 'Creative Commons' copyright licenses that make them easy to use legitimately for, e.g. educational purposes. Within that site the 'Chiron' group especially is dedicated to making images available for classical teaching and research. This site carries many of my own photographs (under the screen name 'Tintern'), including colour images of the House of the Vettii and other sites mentioned in this book. (p. 174)
So mad props to Dr. Stewart for raising the issue of access and then doing something about it. A book from CUP in which the author cites his flickr.com account? That's progress.

Monday, January 31, 2011

In-house commenting systems may not be necessary

Somewhat wishy-washy title, I know.

But here's my point, I look forward to a world of stable URIs for intellectual content in which responses to scholarship and primary data are distributed around the Net.

A case in point, my NYU colleague Chuck Jones blogged about the digitization of some of Blegen's diaries by the American School of Classical Studies.

If you look at the bottom of the post, you'll see that he included the Pleiades URI's for both Mycenae and Tiryns.

It is now the case that a Google search for the Tiryns URI lists Chuck's AWOL post.

Assuming that ASCSA doesn't move that resource to a different URI and that the post remains available, stable URIs for Tiryns and Mycenae have now been permanently associated with the ASCSA resource. And that with the publisher of the information doing nothing. (Though it would be nice if ASCSA ditched the "index.php" from their URIs. See here.)

And note that I'm walking a fine line in this post. The Pleiades URIs that Chuck included explicitly in his post don't appear in the text of mine. I don't see any reason to clog up the Google search with this meta-meta-commentary.

By way of slightly living up to the title, my point is that such a decentralized "commenting system" should be encouraged. If you're able to link from your content to a stable URI that more-or-less represents the same concept, do so. And use such URIs when you're talking about other's people's work. That will encourage a distributed network of publication and response that is robust, open and encompasses many forms of expression from tweets, to blogposts, to more formal work, and beyond.

Wednesday, November 3, 2010

Responses to "Progress on Museum URIs"

Three people responded to yesterday's post on museum URIs.

Leif Isaksen left a comment to the effect that he's not too concerned about differing base URIs for museum collections. I agree that there are worse things than the string "collection." in "http://collection.britishmuseum.org/object/YCA62958". The original explanation was to reduce load on an individual server. Without meaning to get too technical, the "/object" can be an effective load reducer by passing requests to a proxy. Bottom line: in an ideal world, I'd drop the "collection.", but I'm not too worked up about it.

Eric Kansa responded on his blog. His point had an interesting overlap with an e-mail I received. I won't quote that in its entirety as the author could have made it public if s/he wanted to. Here's a snippet:
but to me it seems a very bad idea to think that only museums can claim the right to designate URIs for their objects; there should be a standard that can be used by museums as well as by scientists outside of museums...
I took this as responding largely to
2. In order to avoid that everybody invents a new URI for the same
object, there should be one authority known to the whole world that
assigns such a URI.

3. This authority is naturally the museum that keeps the object,
because it is the only institution that can verify that two
different use cases of museum object URIs actually describe the same
thing.
Taking Eric's and Anonymous' comments together, I read them as calling for a multi-vocal internet in which many agents can assert an identity for an object, with those identities together forming a distributed and diverse commentary on the human past. I totally agree. To be self-critical, I may well have mis-read M. Doerr's e-mail. If he's calling for recognition of the exclusive right of museums to identify their objects, that's a non-starter. It's neither the right thing to do nor is it possible. On first reading, I took his e-mail to represent a welcome assumption of responsibility by museums to provide a locus of stability for reference to their collections. But to be clear, objects will have multiple identifiers. Referring back to a common identifier promoted by and discoverable at the holding institution will ease the process of recognizing that two or more identifiers refer to the "same thing". That will itself promote the idea of a discoverable and multi-vocal discussion about the past.

Tuesday, November 2, 2010

Progress on Museum URIs

I'm including the full text of an e-mail sent by Martin Doerr of the Center for Cultural Informatics on Crete. It's been forwarded to me by a couple of people and there's a call for comment towards the end so it seems to be a public document. That's good because it's an excellent step forward in promoting stable URI's for museum collections. From my perspective, it mostly speaks for itself. Section 7 did cause some concern:
...

Under this consideration, Dominic proposes for the British Museum (http://www.britishmuseum.org/), that all objects of the Museum should be identified on the Semantic Web by the following: http://collection.britishmuseum.org/object/ followed by the "PRN number".

For instance, the Rosetta Stone has the PRN number: YCA62958, hence the "official" URI of the Rosetta stone is: http://collection.britishmuseum.org/object/YCA62958 . This URI should never become direct address of a document.
Just to be clear, if a user cuts-and-pastes 'http://collection.britishmuseum.org/object/YCA62958' into an address bar, or a document links directly to that (which I've just done), that should produce a human readable page. I'd like to see that happen without redirection. If you redirect to that same URL with ".html" appended, then authors will cut-and-paste that string into their documents. If a good non-crufty URI exists, that's what should appear in address bars and that's what should stand as the 'permalink'.

More generally, URIs should promote unity and overlap, not division, between the "semantic web" and the "plain-old web" (POW).

Section 7 also endorses URIs that have a different domain name from the institution itself, e.g. the "collection." in front of "britishmuseum.org". I don't like that. The reason given is to avoid the implications of name changes in the future. Ugh. Institutions should formally endorse the URIs they mint and make them as simple and short as possible. This decision should be taken at the highest levels of the institution. In the BM's case, that may mean the 25-member Board of Trustees.


Finally, the excellent and useful Europeana is mentioned. I'll take this opportunity to note that while http://europeana.eu/portal/record/00401/034BEA5CC6F88ADC6E7DCF5D7C5FECEA8FF85528.html works, http://europeana.eu/portal/record/00401/034BEA5CC6F88ADC6E7DCF5D7C5FECEA8FF85528 doesn't. It should.



Dear colleagues,

I'd like inform you about our discussion today with Dominic Oldman,
Deputy Head of Information Systems, British Museum, his team and
representatives of the Research Space project
(http://sites.google.com/site/rspaceproject/the-team):

1. It is necessary that museum objects are uniquely identified by
suitable URIs in Semantic Web applications.

2. In order to avoid that everybody invents a new URI for the same
object, there should be one authority known to the whole world that
assigns such a URI.

3. This authority is naturally the museum that keeps the object,
because it is the only institution that can verify that two
different use cases of museum object URIs actually describe the same
thing.

4. This URI should be derived in a simple way from the inventory
numbers published in exhibition catalogues, on on-line museum
catalogue access or by asking museum staff, to avoid an error-prone
equivalence matching process.

5. This URI should have a form that enables any museum that wishes
to do so to provide a Linked Open Data service resolving to the
description of that object. Note, that this URI must not be the URL
of an existing document about the object, but it must activate a
standard mechanism prescribed by the Linked Open Data Initiative to
redirect to a document saying what the URI means.

6. This museum object URI will continue be useful for communicating
uniquely about the object, even if the museum never will install an
LoD service, or if the way of dealing with LoD resolution requests
will change.

7. The way to create this URI should be the following: The museum
decides a base URL that will be extended by the inventory number of
the object. The base URL could be within the domain name of the main
museum Website, but in order to stay clear of possible name change
of the latter, a new domain name might be advisable. Also, for
larger museums, resolving LoD access requests to object information
may cause some server load, that can more easily be balanced with a
second name.

Under this consideration, Dominic proposes for the British Museum
(http://www.britishmuseum.org/), that all objects of the Museum
should be identified on the Semantic Web by the following:
http://collection.britishmuseum.org/object/ followed by the "PRN
number".

For instance, the Rosetta Stone has the PRN number: YCA62958, hence
the "official" URI of the Rosetta stone is:
http://collection.britishmuseum.org/object/YCA62958 . This URI
should never become direct address of a document.

It would be good, if Europeana experts to comment, if they regard is
an adequate approach for Europeana, and could transfer this message
to other museums and providers to follow this practice.

I intend to present this on the CIDOC Conference in Shanghai. I
would be very glad if I and Dominic could get a response within the
next week, if you endorse the procedure, and if you will support us
to spread the practice.

If you need further clarifications, please let me know as soon as
possible.

Best wishes,

Martin
--

--------------------------------------------------------------
Dr. Martin Doerr | Vox:+30(2810)391625 |
Research Director | Fax:+30(2810)391638 |
| Email: martin@ics.forth.gr |
|
Center for Cultural Informatics |
Information Systems Laboratory |
Institute of Computer Science |
Foundation for Research and Technology - Hellas (FORTH) |
|
Vassilika Vouton,P.O.Box1385,GR71110 Heraklion,Crete,Greece |
|
Web-site: http://www.ics.forth.gr/isl |
--------------------------------------------------------------

Thursday, October 28, 2010

Ancient Mediterranean Objects at the NMHN

Using posterous.com to track URIs. Here's an Ancient Mediterranean object at the National Museum of Natural History.

If you're reading this at http://mediterraneanceramics.blogspot.com/ , that's part of the experiment as well.

Saturday, October 16, 2010

Change Happens (if it can)

As the result of an e-mail exchange with Neel Smith, one of the designers of the Canonical Text Service Protocol, I've come up with the following formulation:
If a character in a URL can change, it will.
I'm not the only person to think this but I just wanted to get that thought out in simple, direct language.

But what do I mean? Take Worldcat URLs such as http://www.worldcat.org/oclc/502674170. That "www." is annoying and should not be part of the URL that Worldcat presents as its permanent identifier for the book. At some point in the future, somebody there will realize this and remove those unnecessary characters. But http://worldcat.org/oclc/502674170? Now you're talking! And look, it already works.

It's true that the "oclc" could be shortened so maybe I need to qualify the formulation, but I'm not going to for the following reason. Changing those characters would risk collision with other identifying schemes that Worldcat supports such as http://worldcat.org/isbn/0754677737 . The 'www.' is unstable because it can be removed without breaking anything.

The simple formulation stands: If a character in a URL can change, it will.

The implication is, "be aggressive about removing all unnecessary characters from your URLs." The following is a horror-show:
http://www.worldcat.org/title/digital-research-in-the-study-of-classical-antiquity/oclc/502674170
It just looks unstable. Leading me to another formulation:
If a URL looks unstable, it is.

Tuesday, September 21, 2010

Discussing Citation by Example

I've started a set of pages at the Digital Classicist Wiki on the topic of Citation in digital scholarship. In progress, under construction, etc., etc., etc.

The goal is to move existing practice towards a broad understanding of how to make citations to such categories of evidence as primary written sources, geographic entities, cataloged objects, and secondary scholarship so that those citations are:
  • Clearly identified in a robust yet rich fashion
  • Recognizable by automatic agents
  • To resources that are stable over the long-term

But I don't think it will be possible to establish and drive adoption of one very detailed standard. Better to have a simple notation - I follow others in suggesting 'class="citation"' for (x)html - that can indicate the presence of more detailed markup. I'm a fan of RDFa so I further discuss that on the page "Citations with added RDFa.

The Digital Classicist community is pretty open and I'm very grateful to G. Bodard (a.k.a palaeofuturist) for saying the equivalent of "Go for it." when I raised the possibility of hosting these materials in his realm.

There's a category for all the pages and I hope that list will grow.

Wednesday, September 8, 2010

References that just work (but I understand it's not that simple...)

Go to Google. Type in "John 20:24", then hit return. You can even click the "I'm Feeling Lucky" button. Or here's a direct link.

Or try the same thing in Bing (which provides results for Yahoo), and Altavista.

As you'll see, all three searches get you to the relevant passage of the Gospel According to John. And if you poke around on the Biblegateway site, you'll see various translations (but where's the Vulgate?).

That's impressive. It indicates that human readable references can be become so stable that automated agents are able to correctly translate them into links to particular chunks of primary text.

Here are some variations on the theme (all in Google):
"jean 20:24" (at google.fr): Not spot on, but pretty close.
"1 John 2:1": That's a reference to the first epistle of John. Entered into the "address bar" in Chrome. Seems to work.
"John 3": Unqualified chapter reference. Good to go.
"ephesians 1:2": Works.
"eph. 1:2": That abbreviation is OK.
"eph 1.2": Things become fuzzier if I don't use the ':' that is conventional in references to Christian scripture.
"Ephesians 2:4-10": Spans work as well, when properly formatted.

Again, I think this is interesting. Taking the New Testament as a corpus of Ancient Mediterranean texts that were written between the mid-first and third (at the latest: the Epistle of James 1 isn't definitively quoted until Origen) centuries AD makes it relevant to the study of the Ancient World as a whole. As a corpus, it's been around for a long time. Athanasius's letter of AD 367 is one conventional date for the determination of what was in, and what was out.

Those comments aside, the point remains that it is possible to automatically reverse engineer the citation scheme of a very stable corpus. I guess one caveat is that I don't absolutely know that Google, Bing, etc. haven't special cased strings that are plausibly references to the NT. Any ideas?

My larger goal is to think about references to so-called "primary texts" that just work. Given the above, my ad hoc, working definition of "primary text" is any text with a sufficiently stable name and citation scheme that search engines can find it. Sure, that's circular and incomplete, but it will do for now.

Let's try some others:
"gilgamesh 3": Muddled.
"gilgamesh tablet 3": Better.
"Iliad 23": Not bad. No Greek.
"Iliad 23.100": Individual line references don't work.
"Homer Iliad 23.100": Not better.
"Quran 32": I see it as the third link.
"hemingway, the old man and the sea": For comparison. Wikipedia is the top page for me; that's not the text itself. And Amazon is up there, as in the work is in copyright so I'd have to pay. Not sure I want to follow the links that say I can download the text for free.

A major distinction between references to NT texts and the second group is the ability of Google to handle full chapter and verse ('n:n') references. That doesn't seem to work for the Iliad. That's worth exploring.

If I go to Perseus and use the search box at the upper right, "homer iliad 23.100" doesn't work directly. Nor does "iliad 23.100". But "Hom. Il. 23.100" does. If I try that string in Google (link), it gets me to the Chicago version of the Perseus texts (via the 2nd ranked link when I tried it.). [I'll take this opportunity to note that the Chicago Perseus is wicked, and that it's likewise wicked cool that Perseus texts are licensed so this redundancy is possible.]

That kind of variation is one of the reasons I parenthetically qualified the title of this post. References to "primary texts" - and other texts for that matter - are not simple. In this post - as is often my wont - I've let myself be drawn along by current practice. I really do like to see what people are actually doing and how data actually works on the Internet. If you want a more substantive discussion of the problems of citation, I highly recommend Neel Smith's "Citation in Classical Studies" in DHQ 2009. Here's the abstract:
Citation practice reflects a model of a scholarly domain. This paper first considers traditional citation practice in the humanities as a description of our subjects of study. It then describes work at the Center for Hellenic Studies on an architecture for digital scholarship that is explicitly based on this model, and proposes a machine-actionable but technologically independent notation for citing texts, the Canonical Text Services URN.
For now, let me say that it is correct for Google (via Biblegateway) to dereference a citation to John 7:53-8:11 (the Pericope Adulterae) or John 5:7 (the Comma Johanneum). Neither may have been in the "original" text of the Gospel of John, but references to them are semantically clear and have been used "in the wild" so need to be handled. But note that Google prioritizes discussion of the CJ over the text (or at least does when I'm trying it now). Again, see N. Smith on the implications of such variation.

Clearly it helps to have a committed body of believers and/or scholars working on very old texts. Energy and time make for stable references. But there is variability in functionality even within that group. I guess the long-term question is how do we move more texts into the category of "just working"? I am assuming we want to. And how do we support co-existence of the simple "reference following" alongside what Neel describes. Both are useful.

Monday, August 30, 2010

Numbered Paragraphs in Digital Humanities Quarterly

I can recommend Patrik Svensson's article "The Landscape of Digital Humanities" in Digital Humanities Quarterly as a good read. My comments here are about the internals of handing DHQ's paragraph based citation scheme.

Quick intro to the issue: DHQ is an online journal. It doesn't have pages to provide a physical solution to the need to make references to specific points in an article. So the html version numbers each paragraph. So far so good. As a reader I can note the paragraph number and cite it in a future publication.

But I'm not sure DHQ has quite the right implementation of this good idea. I'm arbitrarily picking the paragraph numbered 118. The one that starts, "Information technology, or more broadly the digital, can be seen as affording objects of analysis for the humanities."

Note that I don't include a link directly to that paragraph. That's because I can't. Looking at the HTML source, I see:
<div class="counter">118</div><div class="ptext">Information technology, or more broadly the digital, can be seen as affording...


That's somewhat unfortunate. It would be great if the '<div class="ptext">' were changed to read '<div id="p118" class="ptext">. Then I could mint a URL of the form:
http://www.digitalhumanities.org/dhq/vol/4/1/000080/000080.html#p118


It would be even cooler if the <div class="counter">118</div> also read:
<div class="counter"><a href="#p118">118</a></div>


I've wrapped the paragraph number in a link to the paragraph. That way a user can right/control-click on the link and copy-and-paste it into an e-mail or other work. Easy self-reference to an internal citation structure.

I'd also like to see the paragraph numbers represented in the XML source. Again, taking a snippet of that, the start of the paragraph numbered as 118 in the html, appears in the xml as:
<p>Information technology, or more broadly the digital, can be seen as affording objects of analysis for the humanities...


Unless I'm missing something, the published citation scheme isn't represented in the archival version. I think it should be. Even if DHQ considers the paragraph number ephemeral, I think there's a valid scholarly need for them to be persistent.

I'm a big fan of DHQ so this is constructive criticism. And I'm sort of hoping that I've mis-understood something and that those paragraph numbers are more meaningful than they seem after one looks under the hood.

Tuesday, August 24, 2010

Corrected Versions of Papers

It's late August so my mind is on other things, like the next stretch of split-rail fence that I need to put in. But I do find it interesting that Heather Baker has used Academia.edu to distribute a corrected version of her paper "The layout of the ziggurat temple at Babylon" that first appeared in Nouvelles Assyriologiques Brèves et Utilitaires 2008.2 (Juin). Feel free to be similarly and vaguely inspired about issues of versioning, "scribal error", reference, etc.

Thursday, June 17, 2010

More Papers on Academia.edu

Sure, Academia.edu is far from perfect. But I continue to be psyched when I see people who have uploaded a bunch of papers or a book.Just a brief "thanks" to those who have added to my collection of digital offprints.

Thursday, June 10, 2010

Academia.edu

When I first heard about Academia.edu, I pretty much ignored it. Basically, the last thing I needed was another social media site.

A few weeks ago, however, Chuck Jones sent me an invitation so I took another look. What caught my eye was the papers that users have uploaded. I'm always on the look out for digital content that I can't find elsewhere so the site's role as a central point for the discovery of articles, etc. is very useful.

Here's an incomplete list of some of my fellow "Academicians" whose scholarship I've downloaded:I've tried to do my bit as well by pointing to some of my work that's available on-line. And here are links to the Institute for the Study of the Ancient World and American Numismatic Society pages. I'm intrigued by the opportunity to have a lightweight institutional repository that Academia.edu offers to an organization like the ANS.

Not everything is perfect about the site. The 'Department Viewer' - for want of a better name - relies on Flash. That's lame and doesn't work on an iPad. And I'm not sure the developers have realized that the papers are a real asset. It would be nice to be able to see all papers listed by people I'm following. Or all papers from a single department. And more liberal spreading around of "Follow this Person" buttons would be nice. If I'm looking at the page for a particular paper, I should be able to single-click to follow its author.

If you're interested, come join the fun. And remember to link to your digital scholarship. As you can tell, I think that's the real utility of the site.

Coin Hoards, Timelines and KML

http://nomisma.org/kml/thasos-all.kml is a KML file that shows findspots of hoards with coins of Thasos in them. You can see that file rendered with the Google Map API at http://nomisma.org/id/thasos.

This post is about viewing the KML file in Google Earth. If you do that, you'll see a Timeline slider appear in the top left of the G Earth window. Slide the control to the right and you'll see an explosion of hoards towards the north following the mid-2nd century BC. It's really quite dramatic so give it a shot.

One part of an explanation is that Thasos started striking large numbers of new larger tetradrachms following 148 BC, with many of these traveling north. Exactly why is a matter of historical interpretation. The Roman province of Macedon was established in 146 B.C. and that had a profound effect on both the issuance and circulation of coinage.

Nomisma.org is about making this information more accessible so that more scholars can engage with the question.