Month: October 2013

How to (properly) publish a vocabulary or ontology in the web (part 5 of 6)

This week I want to quickly introduce how and why you should include a license in your vocabulary and documentation. Since this subject has been already dealt with, I am mainly going to be providing links to posts describing these matters in detail.

Why should you add a license to your ontologies? Because if others want to reuse your vocabulary or ontology, the license will clarify what are they allowed doing with it according to the law (for instance, if they have to give attribution to your work). Remember that you are the intellectual author and you have the rights over the resource being published. See more details and types of licenses here.

How can you specify a license? You can add it as a semantic description to the ontology/vocabulary. Two widely used properties are dc:rights and dc:license, from the Dublin Core vocabulary. These properties can be used to describe the OWL file being produced, or in the documentation itself with annotations in RDF-a or microdata. See how it can be done here.

Spend some time analyzing which is the most appropriate license for your work. It may help you and many others in the future! If you are confused on which license to use, this is the one which we use on our vocabularies: http://creativecommons.org/licenses/by-nc-sa/2.0/.

This is part of a tutorial divided in 7 parts:

  1. Overview of the tutorial.
  2. (Reqs addressed A1(partially), A2, A3, A4, P1) Publishing your vocabulary at a stable URI using RDFS/OWL.
  3. (Reqs addressed P2, P3). How to design a human readable documentation.
  4. Extra: A tool for creating html readable documentation
  5. (Reqs addressed P4). Derreferencing your vocabulary
  6. (Reqs addressed A1 (partially)). Dealing with the license. (this post)
  7. (Reqs addressed A5, P5). Reusing other vocabularies. (To appear)

TPDL: Malta and the knights of the digital libraries

Apparently September was the month of library conferences. First, the DC-Ipres conference took place during the first week of the month, while the Theory and Practice of Digital Libraries (TPDL) was celebrated from the 22 to the 26th in Malta. I have recently realized that I forgot to add the summary of TPDL, so my highlights can be found below.

In this occasion my main reason to attend the conference was a tutorial related to the Research Object Models and a workshop about scholarly communication. The tutorial was given as a joint collaboration with the people from the Timbus project, who are doing a great job regarding the preservation of workflows as runnable software components. Have a look at our slides and video for more information.

In general, the impression that I got is that despite its name, TPDL is a very technology-oriented event. Linked Data was a hot topic, but also user interfaces, mining algorithms, classification, preservation and visualizations approaches were discussed for the library domain. Another curious fact is that many of the talks and papers were related to Europeana project data or models. I had no idea of the size of the project, which is leading to many contributions from a huge amount of institutions all over Europe.

Since there were many parallel sessions, my highlights won’t cover everything. If you want more information you can see the whole program here.

My highlights:

  • The Digital libraries for experimental data presentation, where a system for capturing the scripts used within a series of experiments were presented (similar to Reprozip), also using the Open Provenance Model for tracking the provenance of data in the platform.
  • The COST actions for Digital Libraries, which serve to create networks of researchers all over the world.
  • An Interesting map based visualizations using hierarchies and Eruopeana data with a layer approach  (see more here)
  • The project presented in the session “Using Requirements in Audio Visual Research, a quantitative approach”, which will link together fragments of videos (from a repository of more than 800k hours) and annotate them. I asked the responsible whether the data was supposed to be made available or not, but for the moment it doesn’t look like it. Very cool ideas though, and very useful for journalists and regular users.
  • The semantic hierarchical structuring of cultural heritage objects done with Eruopeana data to put together resources that refer to the same “thing”, using metadata (for example, to detect duplicates and several different views (pictures) of the same object). Very useful to curate the data, but it lacked a comparison with other clustering methods, which should be done in the future.
  • The keynote by Sören Auer, where he presented several of the Linked Data aware applications that he and his group had been developing and how they could help librarians in different ways. Ontowiki was the most complete one, a semantic wiki for creating portals and annotating them according to the Linked Data principles (including content negotiation for each of its pages).
  • The “resurrecting myRevolution paper”, regarding the tweets and links that go missing in the web and how to archive and preserve them properly. This presentation in particular focused on tweets that referenced images that don’t exist anymore (e.g., those taken during the green revolution in Iran).
  • A nice motivational presentation by Sarah Callaghan on data citation, why we need it and why we should have it. More details here.
  •  The Investigation Research Objects being created in the SCAPE project, based on the foundations settled by wf4Ever and combining them with persistent identifiers like DOIs.

Finally I wouldn’t like to finish without mentioning that the organizers were given the title of Knights of the Digital Libraries, which was very well received by everyone in the conference. Below you can see some of the ceremony, along with one of the Malta’s National library.

The ceremony of the Knights
The ceremony of the Knights
The Medina, at Malta
The Medina, at Malta
Malta's main Library
Malta’s main Library

How to (properly) publish a vocabulary or ontology in the web (part 4 of 6)

(Update: purl.org seems to have stopped working. I recommend you to have a look at my latest post for doing content negotiation with w3id)

After a long summer break in blogging, I’m committed to finishing this tutorial. In this post I’ll explain why and how to dereference your vocabulary when publishing it in the Web.

But first, why should you dereference your vocabulary? In part 2 I showed how to create a permanent URL (purl) and redirect it to the ontology/vocabulary we wanted to publish (in my case it was http://purl.org/net/wf-motifs). If you followed the example you would have seen that now when you enter the purl you created in the web browser it redirects you to the ontology file. However if you enter http://purl.org/net/wf-motifs you will be redirected to the html documentation of the ontology. When entering the same URL in Protégé, the ontology file will be loaded in the system. By dereferencing the motifs vocabulary I am able to choose what to deliver depending on the type of request received by the server on a single resource: RDF files for applications and nice html pages for the people looking for information about the ontology (structured content for machines, human readable content for users).

Additionally, if you have used the tools I suggested in previous posts, when you ask for a certain concept the browser will take you to the exact part of the document defining it. For example, if you want to know the exact definition for the concept “FormatTransformation” in the Workflow Motif ontology, then you can paste its URI (http://purl.org/net/wf-motifs#FormatTransformation) in the web browser. This makes the life easier for users when browsing and reading your ontology.

And now, how do you dereference your vocabulary? First, you should set the purl redirection as a redirection for Semantic Web resources (add a 303 redirection instead a 302, and add the target URL where you plan to do the redirection). Note that you can only dereference a resource if you control the server where the resources are going to be delivered. The screenshot below shows how it would look in purl for the Workflow Motifs vocabulary. http://vocab.linkeddata.es/motifs is the place our system admin decided to store the vocabulary.

Purl redirection
Purl redirection

Now you should add the redirection itself. For this I always recommend having a look into the W3C documents, which will guide you step by step on how to achieve this. In this case in particular we followed http://www.w3.org/TR/swbp-vocab-pub/#recipe3, which is a simple redirection for vocabularies with a hash namespace. You have to create an htaccess file similar to the one pasted below. In my case the index.html file has the documentation of the ontology, while motif-ontology1.1.owl contains the rdf/xml encoding. If a ttl file exists, you can also add the appropriate content negotiation. All the files are located in a folder called motifs-content, in order to avoid an infinite loop when dealing with the redirections of the vocabulary:

# Turn off MultiViews
Options -MultiViews

# Directive to ensure *.rdf files served as appropriate content type,
# if not present in main apache config
AddType application/rdf+xml .rdf
AddType application/rdf+xml .owl
#AddType text/turtle .ttl #<---Add if you have a ttl serialization of the file

# Rewrite engine setup
RewriteEngine On
RewriteBase /def

# Rewrite rule to serve HTML content from the vocabulary URI if requested
RewriteCond %{HTTP_ACCEPT} !application/rdf\+xml.*(text/html|application/xhtml\+xml)
RewriteCond %{HTTP_ACCEPT} text/html [OR]
RewriteCond %{HTTP_ACCEPT} application/xhtml\+xml [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/.*
RewriteRule ^motifs$ motifs-content/index.html

# Rewrite rule to serve RDF/XML content from the vocabulary URI if requested
RewriteCond %{HTTP_ACCEPT} application/rdf\+xml
RewriteRule ^motifs$ motifs-content/motif-ontology1.1.owl [R=303]

# Rewrite rule to serve turtle content from the vocabulary URI if requested
#RewriteCond %{HTTP_ACCEPT} text/turtle
#RewriteRule ^motifs$ motifs-content/motifs_ontology-ttl.ttl [R=303]

# Choose the default response
# ---------------------------

# Rewrite rule to serve the RDF/XML content from the vocabulary URI by default
RewriteRule ^motifs$ motifs-content/motif-ontology1.1.owl [R=303]

Note the redirections when the owl is being requested. If you have a slash vocabulary, you will have to follow the aforementioned W3C document for further instructions.

Now it is time to test that everything works. The easiest way is just to paste the URI of the ontology in Protégé and in your browser and check that in one case it loads the ontology properly and in the other you can see the documentation. Another possibility is to use curl like this: curl -sH “Accept: application/rdf+xml” -L http://purl.org/net/wf-motifs (for checking that the rdf is obtained) or curl -sH “Accept: text/html” -L http://purl.org/net/wf for the html.

Finally, you may also use the Vapour validator to check that you have done the process correctly. After entering your ontology URL, you should see something like this:

Vapur validation
Vapur validation

Congratulations! You have dereferenced your vocabulary successfully 🙂

This is part of a tutorial divided in 7 parts:

  1. Overview of the tutorial.
  2. (Reqs addressed A1(partially), A2, A3, A4, P1) Publishing your vocabulary at a stable URI using RDFS/OWL.
  3. (Reqs addressed P2, P3). How to design a human readable documentation.
  4. Extra: A tool for creating html readable documentation
  5. (Reqs addressed P4). Derreferencing your vocabulary (this post)
  6. (Reqs addressed A1 (partially)). Dealing with the license. (To appear)
  7. (Reqs addressed A5, P5). Reusing other vocabularies. (To appear)