Saturday, December 31, 2011

Toying with 'Knowledge Representation and Reasoning' for the Ancient World

This is a very (very!) rough opening entry in a discussion I hope to push forward in 2012. But first some preliminaries.
  • I don't know a lot about "Knowledge Representation and Reasoning" but I do know more than I did 48 hours ago. I'm in the world of "Semantic Reasoning" and "OWL 2 Ontologies". That's an interesting, and often very technical, place to be. But fun, all the same.
  • That's why I put "Toying" in the title of this post. I'm really just playing around here and figure I won't find out what I'm doing wrong if I don't share thoughts sooner rather than later.
I've opened a github repository at https://github.com/sfsheath/awo so I'll just dive right in using the mini-ontology that I started there. 'awo' stands for 'Ancient World Ontology' and, again, that's what I'm thinking about. 

The file 'awo.owl' defines, among other things, two people: 'Augustus' and 'Lucius Cornelius Sulla'. This is an opportunity to note that the authority file I'm using for names (of people or other entities) is Wikipedia. I don't know of another publicly accessible resource with such extensive coverage combined with a simple mechanism for creating new identities. As it stands now (this github commit), awo.owl says the following about Augustus and Sulla:

 <owl:Thing rdf:about="#Augustus">
    <rdfs:label>Augustus</rdfs:label>
    <rdf:type rdf:resource="http://schema.org/Person" />
    <is rdf:resource="#Roman_Emperor" />
    <is rdf:resource="#Pontifex_Maximus" />
    <is rdf:resource="#Tribune" />
    <owl:sameAs rdf:resource="http://dbpedia.org/page/Augustus" />
    <owl:sameAs rdf:resource="http://viaf.org/viaf/18013086" />
  </owl:Thing>

  <owl:Thing rdf:about="#Lucius_Cornelius_Sulla">
    <rdfs:label>Lucius Cornelius Sulla</rdfs:label>
    <rdf:type rdf:resource="http://schema.org/Person" />
    <is rdf:resource="#Roman_Dictator" />
    <owl:sameAs rdf:resource="http://dbpedia.org/page/Lucius_Cornelius_Sulla" />
  </owl:Thing>

I hope some of the 'meaning' of this markup is accessible even without 'knowing' OWL. I'm asserting that there are entities (owl:Thing's) "Augustus" and "Lucius_Cornelius_Sulla". It connects those to other defined entities such as "Roman_Emperor" and "Roman_Dictator". Again, those names are taken from Wikipedia.

I know some people won't like the use of "owl:sameAs", but I think it conforms closely to the definition of that term in the OWL 2 documentation. And what about  the "is" property. Here I did become concerned that none of the OWL 2 terms for indicating equivalence between "owl:Thing"'s worked. So I made up the generic "is" property to match the very generic and informal semantics of the fairly reasonable statement "Augustus is a Roman Emperor". I could have used "was" but that seemed silly.

But what about reasoning? The repository also has a file "awo-reasoned.rdf'. That has the following (slightly re-ordered and abridged) statements about both Augustus and Sulla:

  <rdf:Description rdf:about="http://example.org/awo#Lucius_Cornelius_Sulla">
    <rdf:type rdf:resource="http://schema.org/Person"/>
    <j.1:is rdf:resource="http://dbpedia.org/page/Roman_Dictator"/>
    <j.1:is rdf:resource="http://example.org/awo#Roman_Dictator"/>

    <owl:sameAs rdf:resource="http://dbpedia.org/page/Lucius_Cornelius_Sulla"/>
    <owl:sameAs rdf:resource="http://example.org/awo#Lucius_Cornelius_Sulla"/>

    <rdf:type rdf:resource="http://example.org/awo#Office_Holder"/>
    <rdf:type rdf:resource="http://example.org/awo#Roman_Office_Holder"/>
    <rdf:type rdf:resource="http://example.org/awo#Roman_Republican_Office_Holder"/>

    <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
  </rdf:Description>

  <rdf:Description rdf:about="http://dbpedia.org/page/Augustus">
    <rdf:type rdf:resource="http://schema.org/Person"/>
    <j.1:is rdf:resource="http://example.org/awo#Roman_Emperor"/>
    <j.1:is rdf:resource="http://dbpedia.org/page/Roman_Emperor"/>
    <j.1:is rdf:resource="http://example.org/awo#Pontifex_Maximus"/>
    <j.1:is rdf:resource="http://dbpedia.org/page/Pontifex_Maximus"/>
    <j.1:is rdf:resource="http://example.org/awo#Tribune"/>

    <owl:sameAs rdf:resource="http://viaf.org/viaf/18013086"/>
    <owl:sameAs rdf:resource="http://dbpedia.org/page/Augustus"/>
    <owl:sameAs rdf:resource="http://example.org/awo#Augustus"/>

    <rdf:type rdf:resource="http://example.org/awo#Office_Holder"/>
    <rdf:type rdf:resource="http://example.org/awo#Roman_Office_Holder"/>
    <rdf:type rdf:resource="http://example.org/awo#Roman_Religious_Office_Holder"/>
    <rdf:type rdf:resource="http://example.org/awo#Roman_Imperial_Office_Holder"/>

    <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
  </rdf:Description>

This file is generated by the command-line tool in the open source OWL-DL reasoner Pellet. Another win for open source as far as I'm concerned.

To the extent that the mini awo ontology hints at a useful future, it's because both Sulla and Augustus are 'known' to be "#Roman_Office_Holder"s. The ontology defines the owl:Class "Roman_Republican_Office_Holder" as all owl:Things said to be "Roman_Dictator"s. "Roman_Imperial_Office_Holder" is defined as all owl:Things said to be "Roman_Emperor"'s. Both of these classes are sub-classes of "Roman_Office_Holder".

Looking ahead, this simple (simplistic?) demonstration suggests a world in which it is possible to search a corpus of information - be it primary texts or secondary scholarship - for references to "Roman Office Holders" and be shown all documents (or other resources) that reference either Augustus or Sulla. That would be cool.

If you dig into awo-reasoned.rdf, you'll see that everything it says about "Augustus" it also says about the URI http://viaf.org/viaf/18013086. VIAF is the "Virtual International Authority File". Here I'm trying to (again, simply) explore the idea that if an author were to link to that well-known URI published by VIAF, then it would be discoverable that the document making the link referred to not only the Emperor but also to the more generic concept "Roman_Office_Holder". So imagine an Internet that can be queried for "All references to Roman office holders".

And we do want to support more complex queries: "All late Roman military sites in Syria within 30 kilometers of the findspots of LR coins or LR African Red-Slip". We're a long way from that but it's doable on the basis of existing technologies. And the content to support such queries is slowly coming online.

Some other bullet points in this world:
  • It's a world that I can think about because of conversations I've been having with my colleagues at ISAW, with the people running Pelagios, with people I've been writing papers and grants with. And other. There's nothing exceptionally original here. But the next step is to be part of "just doing it."
  • There needs to be a mechanism for bringing together existing RDF-based resources into a big pile of triples from which  a reasoner can extract interesting relationships. The work can't be done by hand by a few individuals. But if we just let the machines run wild, we'll end up with silly conclusions. We need to find the right balance of automatic processing and community sourcing to create an "Ancient World Inference Engine" or "Ancient World Semantic Reasoner" that is actually useful.
  • And that's probably an important principle: make it useful. Here are some thoughts on that:
    • When a "third party" resource links to a URI such as "http://en.wikipedia.org/wiki/Augustus" (or its VIAF equivalent), it would be nice if there were a javascript library that showed a menu offering links based on a JSON serialization of the 'knowledge' in awo-reasoned.rdf. This is an idea that has been floating around and whose time has come.
    • The network of links to stable URIs should be harvested so that the reasoner can work across the entire Ancient World Internet. The internet is the interface that allows community sourcing.
    • Existing resources that provide stability - such as Perseus, PAS, Pleiades, DBPedia, OpenContext, Nomisma.org, and many others -  should be incorporated. Keep new work to a minimum.
    • Another way of saying the above is that an "Ancient World Triple Store and Reasoner" should look to be a "pass through" resource reflecting the existing and developing state of the Internet rather than a destination itself.
    • The whole big pile of reasoned triples should be downloadable so that others can pay for the cycles to query it when they're doing something really complex. CC everything!
The above has started to wander a little bit so I'll end this post here. Let's see what happens in the next year or so...