Friday, February 29, 2008

Various Items

  • The Antikythera Survey Project (ASP) has a nice website. The results page has links to brief overviews of Prehistoric , Classical-Roman, and Medieval-Recent Pottery. These pages are available in Greek as well. The Downloads page provides access to a selection of field-data and imagery.
  • On a related note, the ASP page on Classical-Roman Pottery links to a PDF of Acta Terrae Septemcastrensis V.1 (2006). Speaking of 'rare publications', Worldcat lists one institution that subscribes to this series, though there may be cataloging issues hiding other holdings. But this one volume can't be rare if it's online, right? Download it for the many contributions that include ceramic evidence. For example, the ASP article on page 223 by Nikoleta Pyrrou illustrates a stamped Phokaian Red-Slip (LRC) sherd and some hard to make out Late Roman Amphora 2.
  • I've poked about Search Pigeon, a Google Custom Search tool for open-access journals. Searching for 'roman pottery' led me to Hrčak: Portal of scientific journals of Croatia, on which I found Kristina Jelinčić's Roman Pottery from Ilok, and to Marwan Abu Khalaf, Ibrahim Abu A‘mar, Salah Al-Houdalieh, and Robert Hoyland's The Byzantine and Early Islamic settlement of Khirbat Shuwayka. The latter has nice color images of late roman pottery.
  • The UPenn library now lists Babesch - Bulletin Antieke Beschaving in its digital holdings. Among other titles, I downloaded D. Malfitana, J. Poblome and J. Lund, Eastern Sigillata A in Italy: a socio-economic evaluation, vol. 80 (2005) and D. Steures' Late Roman Thirst: how dark coloured drinking sets from Trier were used, vol. 77 (2002). I am glad to now have both in my collection of digital offprints.

Wednesday, February 27, 2008

Fathi Bejaoui, Ceramique et Religion Chretienne

Current Epigraphy runs a series of posts on rare publications. Here's one in ceramics:

Bejaoui, F. (1997) Céramique et Religion Chrétienne: les thèmes bibliques sur la sigillée africaine, Tunis. (worldcat)

According to Worldcat, it is held by five American and two European institutions.

By way of reaction, the sometimes poor black-and-white images don't detract from this being a useful collection of interesting iconography.

Various Items

  • The list of watchable video episodes at Emerging Cypriot, from the Pyla-Koutsopetria Archaeological Project, continues to grow. The second and third shorts have a lot of ceramic content and are well-worth downloading. The whole idea is great but I have a critique of the presentation. When you go to the site, the titles of the not yet released episodes are shown but you have to mouse-over thumbnail images representing the available ones to see what you're going to get. This is odd and may prioritize the visual effect triggered by the mouse-over rather than the experience of a user trying to find a particular episode. When they are all released, it will be hard to find any one title.
  • The PDF of F. de Callataÿ and H. Gitler's The Coin of Coins
  • illustrates a nice overlap between ceramic and numismatic imagery on page 38.
  • Gabriel Bijovsky's article A Byzantine Gold Hoard from Bet She'an from the American Numismatic Society Magazine has images of sixth to seventh century ceramics as well. The full publication of the hoard is in Revue Numismatique 158 (2002).

Thursday, February 21, 2008

PRAP and Geography Markup Language (GML)

I have begun moving PRAP's GIS data from ESRI shapefiles into Geography Markup Language (GML) and have included the resulting files in the pre-release version of the PRAP Digital Archive.

The initial conversion was easy. I used the site to install the GDAL libraries, which include tools for working with vector data. Converting the shapefile that represents the tracts PRAP walked required only executing the command 'ogr2ogr -f "GML" tracts.gml tracts.shp'. This produced an xml file in gml format that required just a little cleaning to be on its way to something that I'll be comfortable archiving. Here's an example of how the outlines of a tract are currently represented in that file:
<gml:featureMember gml:id="prap:collectionunit:K94-001">
<gml:coordinates>15200.344657999999981,28729.004949499998474 15213.297062499999811,28831.356068500001129 15221.920192999999927,28827.91327050000109 15228.021194499999183,28823.439226499998767 15222.873737500000061,28762.960040000001754 15268.410846500000844,28761.380233500000031 15266.11084800000026,28728.121744000000035 15200.344657999999981,28729.004949499998474</gml:coordinates>

Sure, GML is verbose but that's not a problem since I'm delivering the archive as a compressed tar ball. An actual problem is that the PRAP data is not currently georeferenced so that the co-ords you see above represent an arbitrary grid the project established for its study area. Converting this grid to UTM is doable and is "on my list".

After a few more conversions, the files "tracts.gml", "coast.gml", and "surveyed.gml" can be found in the "spatial/gml" directory of the archive. I have displayed all of these in the open source mapping application QGis so that I know they're passable GML, if not fully compliant with the latest version of the standard.

I have also added an examples directory. The only files in there now are "mapars.xsl" and "ars.gml". "mapars.xsl" is a style sheet that will query the pottery records in the file 'compiled/prap_gml.xml' to find the African Red-Slip, extract the tracts in which those sherds were found, and write that list to a gml file that can likewise be displayed in QGis or any other GML-aware mapping program. Here's the stylesheet:
<?xml version="1.0" encoding="utf-8"?>
Running 'xsltproc mapars.xsl ../compiled/prap_gml.xml > ars.gml'
will produce a gml file showing tracts where African Red-Slip (ars) was
<xsl:stylesheet version="1.0" xmlns:xsl=""
<xsl:output method="xml" encoding="utf-8" indent="yes"/>

<xsl:key name="classes" match="//*[@class]" use="@class"/>
<xsl:key name="ware" match="//x:div[@class='pottery']" use="x:span[@property='ware']"/>
<xsl:key name="gml:ids" match="//*[@gml:id]" use="@gml:id"/>

<xsl:template match="/">
<gml:FeatureCollection xmlns:gml="">

<xsl:for-each select="key('ware','African Red-Slip')">
<xsl:variable name="cuid" select="concat('prap:collectionunit:',x:span[@property='cuidentifier'])"/>
<xsl:copy-of select="key('gml:ids',$cuid)"/>


"ars.gml" is the mappable gml file; see the comment at the top of the stylesheet for the command that will generate it once you've unpacked the tar ball.

To be clear, I think this is incredibly cool. PRAP is now moving towards a consistent xml-based representation for all its data other than digitized photographs. I've already written about using svg to represent ceramic profiles. I will apply the same technique to PRAP's drawings. This consistency of representation will allow PRAP to use the same set of open-source tools to manipulate, analyze, report on and publish its results. And not only will we be able to publish the base data and the results, we will be equally clear about the processes that were applied to the data in order to reach our interpretations.

Data, algorithms and results will all be there for everyone to see. Or at least, that's the idea. There's a long way to go before it happens.

Sunday, February 17, 2008

Mediterranean Ceramics Reference Stability Report, Number 5

The MCRSR first appeared in October, 2007. For the fifth installment, I am again making only one addition, no. 16, a Halaf period jar from Domuztepe in Turkey. The information for this piece comes from Open Context, a project that describes itself as:
a free, open access resource for the electronic publication of primary field research from archaeology and related disciplines.
It is useful that there is a concise url established for each record in Open Context. On the page linked below, you'll see a "Cite Item" button that provides a full citation for this piece. It seems, however, that all such Open Context URLs include the string "space.php?". A brief exchange with Eric Kansa, Open Context's lead developer, has indicated that these unnecessary characters may be removed in the future.

The previously listed URLs for MCRSR items 1-15 remain valid. Readers of last month's report will note that this means that no. 4 from the American Journal of Archaeology is still available at

1. Walters' Catalogue of the Roman Pottery in the Departments of Antiquities, British Museum from Google Books:

2. Robinson's Agora V from JSTOR:

3. Lattara 6:

4. K. Greene's AJA article on Early Roman lead glazed pottery:

5. Heath and Tekkök, Greek, Roman and Byzantine Pottery at Ilion (Troia):

6. Vessel from Çatalhoyuk (via Flickr):

7. A Late Minoan III Pyxis from the Metropolitan Museum of Art:

8. An undocumented ARS Hayes 70 bowl from the dealer Classical Numismatics Group:

9. Fifteenth Century Mosque Lamp from Jerusalem now in the British Museum:

10. The Perseus Project Vase Catalog:

11. Wikimedia Commons Image of a Greek Geometric Skyphos in the Louvre:

12. Sagalassos from Pleiades:

13. Inscribed pot from Aphrodisias (HTML):

14. Inscribed pot from Aphrodisias (XML):

15. Hellenistic lamp from Assos, Turkey at the Museum of Fine Arts, Boston:

16. Open Context record for Halaf period jar from Domuztepe, Turkey:

Friday, February 8, 2008

Emerging Cypriot Update

As noted by Bill Caraher in a comment to my last post, "An Artifact's Journey" wasn't supposed to be released yet. As I said, it's a nice piece so I hope it comes back soon.

Thursday, February 7, 2008

Emerging Cypriot

The Pyla-Koutsopetria Archaeological Project has begun to release a collection of video shorts under the title Emerging Cypriot. So far they seem nicely done. It does look like there's a little confusion with the links in that the top-right thumbnail, "Learning Fieldwalking", links to the video "An Artifact's Journey". No bother since "Journey" follows the route of a sherd through processing and drawing. I enjoyed that.

One not quite substantive comment, the URL for the "Journey" short ends with "An%20Artifact's%20Journey.m4v". That means the file name has spaces and an apostrophe in it. This is wasn't a problem for me but I can imagine situations in which those characters will cause problems on a command line.

PRAP Images: From Join Table to Containment

In my ongoing effort to create an archival version of PRAP's field data, I'm looking at how we dealt with image metadata and what we want to do now. A basic issue with the photographic record of an archaeological field project is that each photo can be of multiple subjects and each potential subject - a site, a tract, a piece of pottery - can appear in multiple photos. Photograph 203.29 is an example.

These relationships aren't very hard to handle in an adequate fashion with a modern database using a straigtforward many-to-many structure. At PRAP we had a table listing photographs and a separate table listing what identifiable entities appeared in each photograph. The latter can be called a join table. A schematic representation of the content of image 203.29 would be as follows:

id: 203.29
filmtype: color slide
caption: byz sherds from area B.

Image<->Subject Join Table
photo_id: 203.29
subject_id: B92-181-02
position: left

subject_id: B92-181-03
position: right

<info about sherd>

<info about sherd>

I'm skipping over a whole lot of detail but I hope it's clear that this establishes that sherd B92-181-02 is the object at the left of photo 203.29. You can see this information put into action with further links on the PRAP website, which is serving the filemaker versions of our databases.

At this point I should say that it was Debi Harlan, now of the excellent ArchAtlas project, and I who implemented this system in the field.

One way to represent this many-to-many concept in xml is just to wrap markup around the records of the join table and leave it at that. For example:

Image<->Subject Join
<div class="imagelink">
<span property="image" src="prap:image:203.29" />
<span property="subject" src="prap:pottery:B92-181-02" />
<span property="position">Left</span>
<div class="imagelink">
<span property="image" src="prap:image:203.29" />
<span property="subject" src="prap:pottery:B92-186-03" />
<span property="position">Right</span>

Image Info
<div class="image" id="prap:image:203.29">
<span property="filmtype">CS</span>
<span property="label">Late Byzantine decorated base contiguous to Site B02, Kavalaria; Byzantine decorated bowl rim from B team tract</span>
<span property="photologdescription">B-92-181-2 (left), B-92-186-3 (right), respectively LByz decorated base, Byz decorated bowl rim</span>

You can see that each 'imagelink' div refers to a separately instantiated image and a separately instantiated subject. Unfortunately, there is some fragility and a lot of unnecessary overhead to this structure. The fragility comes from the possibility of change in one the joined tables. Perhaps "203.29" was a typo in the database. If you change the id of a photo, it needs to be changed in the join information as well.

XML allows one to take advantage of containment to incorporate the link information directly into the image database as follows:

<div class="image" id="prap:image:203.29">
<span property="filmtype">CS</span>
<span property="label">Late Byzantine decorated base contiguous to Site B02, Kavalaria; Byzantine decorated bowl rim from B team tract</span>
<span property="photologdescription">B-92-181-2 (left), B-92-186-3 (right), respectively LByz decorated base, Byz decorated bowl rim</span>
<span property="subjects">
<span property="subject" src="prap:pottery:B92-181-02" />
<span property="position">Left</span>
<span property="subject" src="prap:pottery:B92-186-03" />
<span property="position">Right</span>

I'm not yet happy with the details of this markup. Among the many odd things is the extra 'span' surrounding each subject-position pair. I'd like to call that something. The advantage of this representation is that the text from <div> to </div> is a self-contained description of the image with well-structured links to the named entities that appear within it.

One last point... I was not about to cut-and-paste the subject info into each image div. Instead, I wrote a quick-and-dirty xslt stylesheet to process the two xml datasets and produce a single set of image descriptions that point to their subjects. Here it is:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
xmlns:x="" >

<xsl:key name="classes" match="//*[@class]" use="@class"/>
<xsl:key name="srcs" match="//*[@src]" use="@src"/>

<xsl:template match="/">
<xsl:for-each select="key('classes','image')">
<div class="{@class}" id="{@id}">
<xsl:copy-of select="./*"/>
<span property="subjects">
<xsl:for-each select="key('srcs',@id)/..">
<xsl:when test="count(x:span[@property='subject']) &gt; 1">
<xsl:apply-templates select="x:span[not(@property='image')]"/>
<xsl:copy-of select="x:span[not(@property='image')]"/>

<xsl:template match="x:span">
<xsl:if test="not(contains(@src,':collectionunit:'))">
<xsl:copy-of select="."/>


This stylesheet has now been used to combine the image and image link portions of the PRAP Digital Archive and the tar ball has been updated.

I realize that I'm diving right into xml, xslt, etc. without much explanation. One purpose of this post is simply to share notes on what I've done. The other is to move towards the idea of a "PRAP Digital Archive Cookbook" that illustrates how to work with the PRAP data. This post can't be the beginning of such a publication since it has described a process that changed the underlying files in such a way that the just quoted xslt stylesheet will no longer work. But stay tuned for more fun things to do with this developing dataset...

Friday, February 1, 2008

PRAP, xhtml 2.0 and Archaeological Databases

For some years now I have been working in collaboration with many colleagues on data from the Pylos Regional Archaeological Project (PRAP). As with most more-or-less recent field-projects, one result of all our work is a collection of database files. Ours happen to be in FileMaker but that's a detail in terms of this discussion. What I really want to focus on is our decision to package all the data into an archival format that can be made available for use by all-comers and for storage by institutions that want to help ensure access to this resource over the long term. But what format should we use?

There are a couple of options that are specific to archaeology and/or cultural heritage management. Open Context is using a subset of the ArchaeoML datamodel. There's also the CIDOC Conceptual Reference Model. Right now, I am moving the PRAP data into an xhtml 2.0 based representation and thought I'd take the time to say what I like about it. Of course, I don't mean to reject other options. It's just that I'm looking for a lightweight standard in which to test the data and relationships that we want to archive in as accessible a format as possible. Who knows what will come in the future so for now I'm focusing on the present.

PRAP is a survey project so the first phase of fieldwork was to collect material by tracts, defined roughly as units of similar surface conditions: an olive grove (perhaps divided), a terrace, a fallow field. On the basis of density, Places Of Special Interest (POSIs) were identified - I'll call them sites from now on. Sites where then collected by a grid or by subdividing tracts. Either way, the material from a site consisted of both the tract collection material and the site collection material. Both tracts and the divisions of the site pickup are generically known as "collection units". Collection units held pottery and collected sherds could be numbered by extending the collection unit from which they came. In this system, "A92-001" is a tract collection unit, "A93-901001" is a site collection unit, and "A92-001-01" is a sherd from the tract, whereas "A93-901001-01" is from the site collection.

I'll now go over some broad ideas for representing this data model in xhtml .

Version of 2.0 of xhtml includes the Core, Embedding, and Metainformation attribute modules. In combination with the div and span elements, these modules make the following hypothetical xml fragments/stubs (almost) valid as part of a more complete document.

First, two sherds:

<div class="pottery" id="prap:pottery:A92-001-01">
<span property="ware">African Red Slip</span>
<span property="part">Rim</part>
<span property="quantity">1</span>
<span property="collectionunit" src="prap:collectionunit:A92-001"/>

<div class="pottery" id="prap:pottery:A93-901001-01">
<span property="ware">African Red Slip</span>
<span property="part">Rim</part>
<span property="quantity">1</span>
<span property="collectionunit" src="prap:collectionunit:A93-901001"/>

Now, two collection units:

<div class="collectionunit" id="prap:collectionunit:A92-001">
<span property="method">tract collection</span>
<span property="site" src="prap:site:A01"/>

<div class="collectionunit" id="prap:collectionunit:A93-901001">
<span property="method">site collection</span>
<span property="site" src="prap:site:A01"/>

Now the site A01:
<div class="site" id="prap:site:A01">
<span property="description">An ancient site.</span>

In the above model, divs have classes and a unique id and are analogous to records in a column-oriented database. Divs consist of spans, which have properties and either content or a src attribute. Spans are analogous to database columns/fields. If a span has content, that's the value of the property. If it has a src attribute, that's a reference to the id of an existing div. In this near xml, each sherd is said to come from a collection unit and each collection unit is assigned to a site. Therefore, it is possible to know that these sherds both came from site A01.

I like the fact that each div is self-documenting as to its structure. That's better than a line in a tab-separated text file. I also like that the metainformation is strongly typed. Take for example the following snippet of xslt:

<xsl:key name="classes" match="//*[@class]" use="@class"/>
<xsl:key name="ids" match="//*[@id]" use="@id"/>
<xsl:key name="properties" match="//*[@property]" use="@property"/>
<xsl:key name="srcs" match="//*[@src]" use="@src"/>

When applied to a large repository of xhtml data, this code will build quickly searchable indexes of all the class, id, property, and src attributes. That will in-turn allow for efficient navigation of the database structure. You can see this in practice if you download and unpack the file at prapdigitalarchive-prerelease.tar.gz. Pay attention to the prerelease in the filename. What you're getting is my very preliminary efforts to put the thoughts expressed above into action. You are also getting a mass of unedited field data so don't be picky. And note the CC by-nc-nd license on the files.

Some points of interest. Unpack the file in its own directory. If you want to just see output, look in the pdf folder for sitegazetteer.pdf. If you want to generate this file yourself, you'll need to execute something like the following sequence of commands from within the directory you created for unpacking:

xmllint -xinclude data/include.xml > data/prap.xml
xsltproc xslt/sitegazetteer.xsl data/prap.xml >
fop sitegazetteer.pdf

[Fop is available here.]

Or look at xslt/test.xsl to see a simple manipulation of class, id, property and src attributes. Run that against prap.xml for some not very interesting output.

I don't know that anybody will actually download the file and run these transformations. My point is that you can. That's another advantage of using an xml based data model/work flow. We can publish not only the data but the scripts that we use to manipulate that data. That's an important principle being embraced by researchers, particularly in the sciences.

So... I'm at the beginning of developing the PRAP Digital Archive and this is the first public announcement of it. There are many little things to do and big decisions to make before it gets anywhere near being "finished". But I'll update the tar ball as I go along and will highlight the ceramic bits as appropriate.