Showing posts with label rdf. Show all posts
Showing posts with label rdf. Show all posts

Monday, May 10, 2010

Concept and Document in the Ancient World Semantic Web

This post is really just me taking some notes on semantic web usage. Apologies if it's too discursive but I'm just at the gathering info stage right now.

Along with some colleagues, I've been thinking about the relationships between concrete action and scholarly intent that are inherent in the links we make when creating digital publications.

First some background. Here's a "test" sentence, along with its html.

Themistocles was born in Athens.
Or:
<a href="http://en.wikipedia.org/wiki/Themistocles">Themistocles</a> was born in <a href="http://en.wikipedia.org/wiki/Athens">Athens</a>.


http://en.wikipedia.org/wiki/Athens is a document found on the Internet. As used in our sentence, it is a placeholder for Athens - nebulously defined, I admit - as a concept. Asking the question, "What is the latitude and longitude of Athens?", focuses the issue. It is not useful to respond with the location(s) of the Wikipedia servers. We clearly want to know the location of the site in "the real world", or 37° 58′ 0″ N, 23° 43′ 0″ E.

Links point to documents, we often mean the underlying concept. Often this distinction doesn't matter. Sometimes it does, as in:

My source for the longitude and latitude of Athens is the Wikipedia article for Athens.

That sentence has the same link appearing two times, one meaning the concept, the other meaning the document. Wikipedia provides no mechanism for distinguishing between these meanings.

DBpedia does implement this distinction. But first, here's the intro sentence from the DBpedia website:
DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia, and to link other data sets on the Web to Wikipedia data.

In DBpedia, the following URLs are both valid:

  • http://dbpedia.org/resource/Athens
  • http://dbpedia.org/page/Athens

The first refers to the concept, the second is a specific document. This allows for the following useful HTML:
My source for the longitude and latitude of <a href="http://dbpedia.org/resource/Athens">Athens</a> is the <a href="http://dbpedia.org/page/Athens">DBpedia page</a>.

Looking at the DBpedia page http://dbpedia.org/page/Athens is useful because it gives a list of resources that are each related to dbpedia:Athens via owl:sameAs. These are:

Before looking at one of these, what is owl:sameAs? The OWL Web Ontology language is described here. Among the descriptions of owl:sameAs given there is that a "...typical use of sameAs would be to equate individuals defined in different documents to one another, as part of unifying two ontologies". So the DBpedia usage, which is paralleled in many other semantic web resources, is spot on.

The geonames.org reference is interesting. In part because the site has a discussion that explicitly addresses the difference between concept and document: http://www.geonames.org/ontology/. That page also has a link to a good blog post.

DBpedia follows the Geonames guidelines in using owl:sameAs to qualify its link to http://sws.geonames.org/264371/ , which is the Geonames URI for the concept "Athens". Clicking on that redirects you to the page http://www.geonames.org/264371/athens.html. Note the change of host to 'www.geonames.org' and the addition of 'athens.html'. The serial number remains the same.

Here is a screen grab of the "balloon" that is displayed next to the icon indicating the location of Athens.


There are two interesting links shown in this image: 'perma link' and 'semantic web rdf':

http://www.geonames.org/264371/athens.html is just the link to the page. http://sws.geonames.org/264371/about.rdf is an RDF document. It's worth looking at the source to see the attribute 'rdf:about="http://sws.geonames.org/264371/"'. URLs of the pattern 'http:...about.rdf' are documents. http://sws.geonames.org/264371/ is a concept.

Even with this soup of web addresses, there is a lot that Geonames is doing right. The only missed opportunity I see is no explicit indication in the "264371/athens.html" page of the concept address. There is the following: <link rel="alternate" type="application/rdf+xml" title="RDF Version" href="http://sws.geonames.org/264371/about.rdf" />'. This is a link to a document not a concept. And 'alternate' is too vague for me to know that I can parse that RDF to find its @about value.

It would be nice if there were somelthing like '<link rel="concept" type="application/rdf+xml" title="Concept URI" href="http://sws.geonames.org/264371" />'. I'm not too concerned with what's in @type so I left it as is. Bit 'concept' is not in anyway standard. I just made it up.

If this post has a point, that's it. Make it really easy for me to figure out which URI is for the concept, because that's the one I really want to use. Or maybe I should end with a question. Is there an unambiguous and widely-accepted convention for indicating the concept lying behind a document? If not, we need one.

Tuesday, January 26, 2010

RDFa Patterns for Ancient World References

I am continuing to experiment with semantic links within digital publications relevant to the Ancient World. Here's a snippet from the same article I drew from in the last post.
In 124, Polemon had spoken before Hadrian and persuaded him to make a gift of money and grant a series of honors to Smyrna, not least of which was a second temple to the imperial cult (IvS 697; Burrell 2004: 42-48).
The "things" I want to identify are:
  • The year 124 as an event.
  • The sophist Polemon
  • The emperor Hadrian
  • The imperial cult
  • And the two citations
And I want to do this in a standards-based way that is automatically recognizable by third-parties (or at least their software agents).

As before, I'm using RDFa. In a future post, I'll explain this choice and talk about what RDFa and RDF are, but for now I'm diving right in.

The relevant namespaces that I'm using are:
  • xmlns:dbpedia="http://dbpedia.org/resource/"
  • xmlns:cito="http://purl.org/net/cito/"
  • xmlns:ev="http://purl.org/rss/1.0/modules/event/"
  • xmlns:ex="http://example.org/"
  • xmlns:foaf="http://xmlns.com/foaf/0.1/"
  • xmlns:frbr="http://purl.org/vocab/frbr/core#"
  • xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
  • xmlns:owl="http ://www.w3.org/2002/07/owl#"
  • xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
  • xmlns:skos="http://www.w3.org/2008/05/skos#"
  • xmlns:xsd="http://www.w3.org/2001/XMLSchema"
All the markup that follows is experimental and comments are welcome, of course.

Polemon
The reference to Polemon now looks like:
<span id="id2209"
about="#id2209"
typeof="skos:Concept foaf:Person"
resource="[dbpedia:Polemon_of_Laodicea]"
rel="owl:sameAs cite"
property="rdfs:label">Polemon</span>


With the '<head>' of the document including '<base href="http://example.org/ajn2006-smyrna.html"/>', that RDFa gives the following RDF/turtle:

<http://example.org/ajn2006-smyrna.html#id2209>
owl:sameAs dbpedia:Polemon_of_Laodicea ;
a skos:Concept, foaf:Person ;
<http://www.w3.org/1999/xhtml/vocab#cite> dbpedia:Polemon_of_Laodicea ;
rdfs:label "Polemon"@en .
Some observations:
The pairing of 'id' and 'about' attributes means that I can identify a span of text and then say things about it.

I then give that span a type. Here I say that it's a skos:Concept and a foaf:Person. Which concept and which person? http://dbpedia.org/resource/Polemon_of_Laodicea. 'skos:Concept' will be used on all named-entities, and their nature will be further qualified when it's useful.

Why "owl:sameAs'. Here I follow the usage of dbpedia.org. If you look at the Polemon page, you'll see the same construct used to make the link to freebase. 'owl:sameAs' also underlies sameas.org (see the n3 for Hadrian).

The metaphor here is that I am instantiating Poleman as a concept and person present in the text. That should be recognizable and actionable. There is some redundancy in how I go about doing it, but that is in the spirit of convenience for future processors of this data.

"In 124"
This looks like:
<span id="id3724"
about="#id3724"
typeof="frbr:Event"
rel="owl:sameAs"
resource="dbpedia:124"
property="ev:startdate"
datatype="xsd:year"
content="124">In 124</span>
Same basic process. I isolate some text as individually addressable. I say what is, in this case a FRBR Event. Here I also embed a machine-readable property, the start date, into the document , but retain the inline text as the label.

But I am probably on less-firm ground here. I use FRBR because it's an LOC approved standard. I annotate the event with an RSS Event property and that's a little weak. And it might seem odd to equate the event with the dbpedia representation of the year 124. If you follow through to the wikipedia version, that does refer to Hadrian's trip east, which is the setting for Polemon's speech. In the case of a better known event, I think I'd prefer to link to a representation of that, for example http://dbpedia.org/page/Sack_of_Rome_(455). The 'owl:sameAs' on that page will eventually redirect you to the right Wiki page.

Here's the RDF/Turtle produced by the above RDFa:
<http://example.org/ajn2006-smyrna.html#id3724>
owl:sameAs <dbpedia:124> ;
ev:startdate "124"^^xsd:year ;
a frbr:Event, skos:Concept .
As above, the goal is for this to be usable in a number of contexts.

References
There are two inline references at the end of the sentence. The first is to a primary source, an inscription at Smyrna as published in Petzl, G. (1982). Die Inschriften von Smyrna. Bonn: Habelt. The second is to Barbara Burrell's Burrell, B. (2004). Neokoroi: Greek cities and Roman emperors. Cincinnati classical studies, new ser., v. 9. Leiden: Brill.

Here's the RDFa for the second:
<span id="id4616"
about="#id4616"
typeof="ex:Citation"
rel="cito:citesAsAuthority cite"
resource="http://www.worldcat.org/oclc/53013513"
property="rdfs:label">Burrell 2004: 42-48</span>
This is similar markup as previously, except I'm not instantiating it as a 'skos:Concept'. I am using the CITO ontology to indicate the relationship between the works, but note that I'm currently making up the type 'ex:Citation'. Perhaps I could use 'cito:Document' but that doesn't seem quite right. I really want to mark this span of text as being a citation but haven't found just the right RDF vocabulary. I looked at BIBO but, like CITO, it doesn't have the exact class I want. BIBO is linked with Zotero so I'd like to use it. For now, CITO has a more detailed set of relationships between citing and cited documents so I'm going with that. Worldcat also isn't great because there's confusion about the 'terms of use' but it will do for this experimental phase.

Here's the RDF/Turtle:
<http://example.org/ajn2006-smyrna.html#id4616>
cito:citesAsAuthority <http://www.worldcat.org/oclc/53013513> ;
a ex:Citation ;
<http://www.w3.org/1999/xhtml/vocab#cite> <http://www.worldcat.org/oclc/53013513> ;
rdfs:label "Burrell 2004: 42-48"@en .

The RDFa for the epigraphic reference looks like:
<span id="id9773"
about="#id9773"
typeof="ex:Citation"
rel="cito:citesAsAuthority ex:citesAsPrimarySource"
resource="http://www.worldcat.org/oclc/8935414"
property="rdfs:label"><i>IvS</i> 697</span>
The main difference here is that I'm also making up the 'ex:citesAsPrimarySource' value for the rel attribute. The concept of "Primary Source" and references thereto is important for the Humanities and we need a way of indicating its usage.

It's also important that I'm referring to the publication of the inscription, not the inscription itself. When a digital surrogate becomes available, I can point to that. In the meantime, a way of standardizing references to parts of a work would be useful. But I don't think you can just tag on a fragment identifier, as in http://www.worldcat.org/oclc/8935414#no.%20697, since the implication there is that such an ID actually exists. And it might be rude to put the same after a '?'. Something to ponder...


Instead of continuing on with each named entitiy, here's the whole sentence with RDFa visible:
<span id="id3724" about="#id3724" typeof="skos:Concept frbr:Event" rel="owl:sameAs" resource="dbpedia:124" property="ev:startdate" datatype="xsd:year" content="124">In 124</span>, <span id="id2209" about="#id2209" typeof="skos:Concept foaf:Person" resource="[dbpedia:Polemon_of_Laodicea]" rel="owl:sameAs cite" property="rdfs:label">Polemon</span> had spoken before <span id="id5130" about="#id5130" typeof="skos:Concept foaf:Person" rel="owl:sameAs cite" resource="[dbpedia:Hadrian]" property="rdfs:label">Hadrian</span> and persuaded him to make a gift of money and grant a series of honors to <span id="id39156" about="#id39156" typeof="skos:Concept geo:SpatialThing" rel="owl:sameAs cite" resource="http://pleiades.stoa.org/places/550771" property="rdfs:label">Smyrna</span>, not least of which was a second temple to the <span id="id4168" about="#4168" typeof="skos:Concept dbpedia:Religion" rel="owl:sameAs cite" resource="dbpedia:Imperial_cult_(ancient_Rome)]" property="rdfs:label">imperial cult</span> (<span id="id9773" about="#id9773" typeof="ex:Citation" rel="cito:citesAsAuthority ex:citesAsPrimarySource" resource="http://www.worldcat.org/oclc/8935414" property="rdfs:label"><i>IvS</i> 697</span>; <span id="id4616" about="#id4616" typeof="ex:Citation" rel="cito:citesAsAuthority cite" resource="http://www.worldcat.org/oclc/53013513" property="rdfs:label">Burrell 2004: 42-48</span>).
And here's the RDF/Turtle:

<http://example.org/ajn2006-smyrna.html#id3724>
owl:sameAs <dbpedia:124> ;
ev:startdate "124"^^xsd:year ;
a frbr:Event, skos:Concept .

<http://example.org/ajn2006-smyrna.html#id2209>
owl:sameAs dbpedia:Polemon_of_Laodicea ;
a skos:Concept, foaf:Person ;
<http://www.w3.org/1999/xhtml/vocab#cite> dbpedia:Polemon_of_Laodicea ;
rdfs:label "Polemon"@en .

<http://example.org/ajn2006-smyrna.html#id5130>
owl:sameAs dbpedia:Hadrian ;
a skos:Concept, foaf:Person ;
<http://www.w3.org/1999/xhtml/vocab#cite> dbpedia:Hadrian ;
rdfs:label "Hadrian"@en .

<http://example.org/ajn2006-smyrna.html#id39156>
owl:sameAs <http://pleiades.stoa.org/places/550771> ;
a geo:SpatialThing, skos:Concept ;
<http://www.w3.org/1999/xhtml/vocab#cite> <http://pleiades.stoa.org/places/550771> ;
rdfs:label "Smyrna"@en .

<http://example.org/ajn2006-smyrna.html#4168>
owl:sameAs <dbpedia:Imperial_cult_(ancient_Rome)]> ;
a dbpedia:Religion, skos:Concept ;
<http://www.w3.org/1999/xhtml/vocab#cite> <dbpedia:Imperial_cult_(ancient_Rome)]> ;
rdfs:label "imperial cult"@en .

<http://example.org/ajn2006-smyrna.html#id9773>
ex:citesAsPrimarySource <http://www.worldcat.org/oclc/8935414> ;
cito:citesAsAuthority <http://www.worldcat.org/oclc/8935414> ;
a ex:Citation ;
rdfs:label "<i>IvS</i> 697"^^rdf:XMLLiteral .

<http://example.org/ajn2006-smyrna.html#id4616>
cito:citesAsAuthority <http://www.worldcat.org/oclc/53013513> ;
a ex:Citation ;
<http://www.w3.org/1999/xhtml/vocab#cite> <http://www.worldcat.org/oclc/53013513> ;
rdfs:label "Burrell 2004: 42-48"@en .


Some of these constructs deserve more comment but this post is getting long. The only thing to add is that fairly soon I will publish a javascript toolset that starts making use of these patterns.