Thursday, February 21, 2008

PRAP and Geography Markup Language (GML)

I have begun moving PRAP's GIS data from ESRI shapefiles into Geography Markup Language (GML) and have included the resulting files in the pre-release version of the PRAP Digital Archive.

The initial conversion was easy. I used the macports.org site to install the GDAL libraries, which include tools for working with vector data. Converting the shapefile that represents the tracts PRAP walked required only executing the command 'ogr2ogr -f "GML" tracts.gml tracts.shp'. This produced an xml file in gml format that required just a little cleaning to be on its way to something that I'll be comfortable archiving. Here's an example of how the outlines of a tract are currently represented in that file:
<gml:featureMember gml:id="prap:collectionunit:K94-001">
<gml:geometryProperty>
<gml:Polygon>
<gml:outerBoundaryIs>
<gml:LinearRing>
<gml:coordinates>15200.344657999999981,28729.004949499998474 15213.297062499999811,28831.356068500001129 15221.920192999999927,28827.91327050000109 15228.021194499999183,28823.439226499998767 15222.873737500000061,28762.960040000001754 15268.410846500000844,28761.380233500000031 15266.11084800000026,28728.121744000000035 15200.344657999999981,28729.004949499998474</gml:coordinates>
</gml:LinearRing>
</gml:outerBoundaryIs>
</gml:Polygon>
</gml:geometryProperty>
</gml:featureMember>

Sure, GML is verbose but that's not a problem since I'm delivering the archive as a compressed tar ball. An actual problem is that the PRAP data is not currently georeferenced so that the co-ords you see above represent an arbitrary grid the project established for its study area. Converting this grid to UTM is doable and is "on my list".

After a few more conversions, the files "tracts.gml", "coast.gml", and "surveyed.gml" can be found in the "spatial/gml" directory of the archive. I have displayed all of these in the open source mapping application QGis so that I know they're passable GML, if not fully compliant with the latest version of the standard.

I have also added an examples directory. The only files in there now are "mapars.xsl" and "ars.gml". "mapars.xsl" is a style sheet that will query the pottery records in the file 'compiled/prap_gml.xml' to find the African Red-Slip, extract the tracts in which those sherds were found, and write that list to a gml file that can likewise be displayed in QGis or any other GML-aware mapping program. Here's the stylesheet:
<?xml version="1.0" encoding="utf-8"?>
<!--
Running 'xsltproc mapars.xsl ../compiled/prap_gml.xml > ars.gml'
will produce a gml file showing tracts where African Red-Slip (ars) was
recorded.
-->
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:x="http://www.w3.org/2002/06/xhtml2/"
xmlns:gml="http://www.opengis.net/gml"
xmlns:exsl="http://exslt.org/common"
extension-element-prefixes="exsl">
<xsl:output method="xml" encoding="utf-8" indent="yes"/>

<xsl:key name="classes" match="//*[@class]" use="@class"/>
<xsl:key name="ware" match="//x:div[@class='pottery']" use="x:span[@property='ware']"/>
<xsl:key name="gml:ids" match="//*[@gml:id]" use="@gml:id"/>

<xsl:template match="/">
<gml:FeatureCollection xmlns:gml="http://www.opengis.net/gml">
<gml:boundedBy>
<gml:Box>
<gml:coord>
<gml:X>14625.166504</gml:X>
<gml:Y>13709.92893965028</gml:Y>
</gml:coord>
<gml:coord>
<gml:X>35518.298225</gml:X>
<gml:Y>30227.18022459958</gml:Y>
</gml:coord>
</gml:Box>
</gml:boundedBy>

<xsl:for-each select="key('ware','African Red-Slip')">
<xsl:variable name="cuid" select="concat('prap:collectionunit:',x:span[@property='cuidentifier'])"/>
<xsl:copy-of select="key('gml:ids',$cuid)"/>
</xsl:for-each>
</gml:FeatureCollection>
</xsl:template>

</xsl:stylesheet>

"ars.gml" is the mappable gml file; see the comment at the top of the stylesheet for the command that will generate it once you've unpacked the tar ball.

To be clear, I think this is incredibly cool. PRAP is now moving towards a consistent xml-based representation for all its data other than digitized photographs. I've already written about using svg to represent ceramic profiles. I will apply the same technique to PRAP's drawings. This consistency of representation will allow PRAP to use the same set of open-source tools to manipulate, analyze, report on and publish its results. And not only will we be able to publish the base data and the results, we will be equally clear about the processes that were applied to the data in order to reach our interpretations.

Data, algorithms and results will all be there for everyone to see. Or at least, that's the idea. There's a long way to go before it happens.

1 comment:

Carl Reed (Geospatial Standards) said...

Well done! Would it be OK to add your PRAP GML activity to a list of GML application schemas and other information that we maintain on ogcnetwork.net?

You can respond to my OGC email creed@opengeospatial.org

Regards

Carl Reed
CTO
OGC