Tuesday, May 18, 2010

Towards a metadata header for XHTML5+RDFa1.1 Digital Publications

XHTML5 defines elements such as 'header' and 'summary' that improve the constructs for indicating document metadata. But it is not a finished solution for embedding these concepts in a born-digital scholarly publication. In this post I take an initial crack at a decent way of doing this.

To cut to the chase, here's a sample document:
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:owl="http ://www.w3.org/2002/07/owl#"

<title property="dc:title">Guidelines for Using XHTML5 to encode Digital Publications</title>
<base href="http://example.org/digpub"/>
<div rel="bibo:authorList">
<ul rel="rdf:Seq">
<li rel="rdf:li">
By <span rel="dc:creator">
<span rel="foaf:Person">
<span property="foaf:name" rel="owl:sameAs" resource="http://en.wikipedia.org/wiki/Albert_Gallatin">Albert Gallatin</span>

<li rel="rdf:li">
and <span rel="dc:creator">
<span rel="foaf:Person">
<span property="foaf:name" rel="owl:sameAs" resource="http://en.wikipedia.org/wiki/William_Alexander_Hammond">William Alexander Hammond</span>
<summary property="dc:description" xml:lang="en">An abstract in English.</summary>
<summary property="dc:description" xml:lang="fr">Un résumé en Française.</summary>
<h1>Section 1</h1>
<p>Your text here.</p>
And here's the turtle representation of the embedded RDF:
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix : <http://www.w3.org/1999/xhtml> .
@prefix bibo: <http://purl.org/ontology/bibo/> .
@prefix dc: <http://purl.org/dc/terms/> .
@prefix dctypes: <http://purl.org/dc/dcmitype/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix owl: <http ://www.w3.org/2002/07/owl#> .

dc:description "An abstract in English."@en, "Un résumé en Française."@fr ;
dc:title "Guidelines for Using XHTML5 to encode Digital Publications" ;
bibo:authorList [
rdf:Seq _:bnode1
] ;
a dctypes:Text .

rdf:li [
dc:creator [
foaf:Person [
owl:sameAs <http://en.wikipedia.org/wiki/Albert_Gallatin> ;
foaf:name "Albert Gallatin"
], [
dc:creator [
foaf:Person [
owl:sameAs <http://en.wikipedia.org/wiki/William_Alexander_Hammond> ;
foaf:name "William Alexander Hammond"
] .

Even if you don't read turtle that might make some sense.

  • <http://example.org/digpub> after the prefixes means we're defining attributes of a document at that URI.
  • The last line of this indented section just says that the document is 'a dctypes:Text' resource.
  • dc:description "...." means the RDF extractor has found the Dublin Core description (more or less used as 'abstract'). The abstract is available in two languages as indicated by the 'xml:lang' attribute in the document.
  • Same for dc:title. Note that I don't put the title in the 'body' element because html-family encoding schemes want it up in the 'head'. Perhaps the title should be repeated in the 'body/header'. I'm inclined to think that can be done on delivery to a browser when a document is published via a web-server. The archival version should not have such repetition.
  • We then come to a 'bibo:authorList', the contents of which are specified following the line beginning '_:bnode1'. The "Bibliographic Ontology" (here 'bibo') uses this construct for multi-authored works. I'm not sure I like it. Especially since it imposes the extra nesting of rdf:Seq and rdf:li. But if 'bibo' is widely adopted (which it sort of is) then it's not my place to complain. Conform to the standard and move on. The contents of each rdf:li in a bibo:authorList are not well defined in the spec. I looked through the bibo examples, adopted its use of dc:creator and foaf:Person, and then added an owl:sameAs for good measure.
My point in doing all this is to make use of existing standards that allow a corpus of born-digital scholarship to represent metadata in a machine-recognizable fashion that also allows the "text parts" to be human readable. I'm just at the beginning of this project so I welcome suggestions of where I can look for good models.

No comments: