Friday, February 26, 2010

Coming to terms with HTML 5

I haven't heard much talk among digital humanists about HTML 5. If I've missed something please let me know.

I will admit that for a long time I sort of ignored it. I was interested in xhtml 2 but that's dead. And when the html 5 discussions began, xhtml seemed like a barely tolerated intruder. That's clearly less so currently. Then there was the dismissive attitude of the "5" folk towards RDFa. Everybody seems to be talking now and that's good.

So it looks like there will be an XHTML5 that directly supports RDFa. I'm assuming that means in the DOM as it's made available to Javascript. (Somebody tell if I'm wrong about that).

With this in mind, I spent the day catching up with developments in the html 5 community. Sometimes focusing on integration with RDFa but also just catching up.

This series of articles by "boblet" was well-written and useful. On the RDFa front, I read Mark Birbeck's discussion about tokenizing RDFa. Likewise interesting. And see the RDFa section of the Microformats.org HTML5 page.

I like that html 5 supports structures along the lines of:
<html>
<head>
<title></title>
</head>
<body>
<section>
<summary></summary>
</section>

<section>
<h1>Introduction</h1>
<p></p>
...
</section>
<section>
<h1>Next section</h1>
<p></p>
...
</section>
</body>
</html>
Add in more xhtml 1.0 bits and you can really think about doing a nice job of publishing prose works digitally with the html5 vocabulary. And don't forget the '<article>' element. That looks interesting as well.

Not all is perfect. I've always been bummed that the title element goes in the head of an (x)html document. That means that if you want it to show up in the document part of a browser window, you have to repeat it. There's some silliness there. Why can't a title element go anywhere? And would it it really be a problem if a document had more than one title in it? I can think of use-cases where that works: more than one article in a single html file, or a list of objects that have titles.

And there's still no preferred way of doing footnotes. The section in the spec 4.6.26 is sort of a punt. The boblet articles suggest <aside> for footnotes but that isn't encouraged in the spec. I see that there's a "note" value for the rel attribute on the WHATWG RelExtensions page. That list is an official part of the html 5 spec (see "Other Link Types"). But the spec is totally vague on how a proposed rel moves to actual approval.

And anybody using xhtml is still going to have lots of decisions about what goes in class attributes and how to specify lots of basic things like 'author'. That smacks of being proprietary. How much can Dublin Core help with this?

So... it was a day of mostly reading. I added a little bit of xhtml 5 to the git repository under an xhtml5 branch but only just a hint of what I should do to really "commit" to such a big change.

Tuesday, February 23, 2010

OpenCyc + Wiki/DB-Pedia and Ancient World References

This is another post in the Ancient World RDFa series.

I'm writing now because I have two questions in mind, one fairly general and one very specific:
  • Is there a pre-existing ontology that I can use to identify concepts found in Ancient World scholarship?
  • How can I indicate the office of "strategos" that was held by the sophist Polemon.
The topic comes up because I'm faced with the sentence fragment:
Polemon also appears as strategos on coins of Hadrian...
Again, how to mark the text "strategos" so that it is identified as the ancient office. Here's what I have so far:
<span
id="id8296"
about="#id8296"
typeof="skos:Concept opencyc:PublicOffice"
rel="owl:sameAs"
resource="[dbpedia:Strategos]"
property="rdfs:label"
>strategos</span>
That give the following RDF/Turtle
<http://example.org/ajn2006-smyrna.html#id8296>
a opencyc:PublicOffice, skos:Concept ;
owl:sameAs dbpedia:Strategos ;
rdfs:label "strategos"@en .
In short, this says that there's an instance of a public office and that office is "strategos".

The "opencyc" namespace maps to "http://sw.opencyc.org/". You can read about OpenCyc at http://www.opencyc.org, where you'll be told that OpenCyc is an "ontology containing hundreds of thousands of terms, along with millions of assertions relating the terms to each other, forming an ontology whose domain is all of human consensus reality." Even accounting for "commericial-speak", this could be useful. And yes, it's based on a commercial product, but CC-Licensed versions of the whole thing can be downloaded from http://www.opencyc.org/downloads.

The landing place for PublicOffice is http://sw.opencyc.org/2009/04/07/concept/en/PublicOffice. "Mayor" and "Ambassador" are example instances of PublicOffice so I'm comfortable using it as the type for Strategos. But "Strategos" iteself is not in OpenCyc. I think this will be a common situation: knowledge bases intended for the modern world will have many useful analogs for concepts that appear in Ancient World scholarhip, but the specific vocabulary will be missing.

OpenCyc has entries forYou can replace many narrowly scoped namespaces with these and other concepts that appear in OpenCyc.

But again, no "Strategos". This is where Wikpedia (via DBPedia) comes in. Here's the Wikipedia article. I map that into the Semantic Web via DBPedia.

So here's a basic principle: OpenCyc is the default ontology, DBPedia is the default vocabulary. I think that plays to the strengths of each resource.

Neither is complete for the Ancient World. That's probably more of a problem for the use of OpenCyc. DBPedia doesn't have a page for the ceramic type "Eastern Sigillata A". If I write one for Wikipedia, that will eventually migrate to DBPedia. OpenCyc doesn't have an easy route for community-based editing. Will the concepts "Excavation Unit" or "Survey Collection Unit" be necessary? Probably. That means coming up with or finding an ontology for those.

Thursday, February 4, 2010

Ancient World Digital Publishing Test Suite

This post is just a brief notice that I have begun a test suite of xhtml+rdfa and related documents to facilitate my work on digital publication for ancient world scholarship. It's very much "pre-release" at this point so I'm putting the suite out there for the sake of sharing, not because it's useful in its current state.

Right now, there are a few files in a git repository at http://github.com/sfsheath/awdp-test/. To download, try http://github.com/sfsheath/awdp-test/archives/master.

As the files become more useful, I'll talk more about what I'm trying to achieve with this project.