Category: Linked Data

How to (easily) publish your ontology permanently: OnToolgy and w3id

How to (easily) publish your ontology permanently: OnToolgy and w3id

I have recently realized that I haven’t published any post for a while, so I don’t think there is a better way to start 2017 than with a small tutorial: how to mint w3ids for your ontologies without having to issue pull requests on Github.

In a previous post I described how to publish vocabularies and ontologies in a permanent manner using w3ids. These ids are community maintained and are a very flexible approach, but I have found out that doing pull requests to the w3id repository may be a hurdle for many people. Hence, I have been thinking and working towards lowering this barrier.

Together with some colleagues from the Universidad Politecnica de Madrid, we released a year and a half ago a tool for helping documenting and evaluating ontologies: OnToology. Given a Github repository, OnToology tracks all your updates and issues pull requests with their documentation, diagrams and evaluation. You can see a step by step tutorial to set up and try OnToology with the ontologies of your choice. The rest of the tutorial assumes that your ontology is tracked by OnToology.

So, how can you mint w3ids from OnToology? Simple, go to “my repositories tab:

fig1

Then expand your repository:

repo

And select “publish” on the ontology you want to mint a w3id:

publish

Now OnToology will request a name for your URI, and that’s it! The ontology will be published under the w3id that appears below the ontology you selected. In my case I selected to publish the wgs84 ontology under the “wgstest” name:

published

As shown in the figure, the ontology will be published under “https://w3id.org/def/wgstest”

If you want to update the html in Github and want to see the changes updated, you should click on the “republish” button that now replaces the old “publish” one:

republish

Right now the ontologies are published on the OnToology server, but we will enable the publication in Github by using Github pages soon. If you want the w3id to point somewhere else, you can either contact us at ontoology@delicias.dia.fi.upm.es, or you can issue a pull request to w3id adding your redirection before the 302 redirection in our “def” namespace: https://github.com/perma-id/w3id.org/blob/master/def/.htaccess

Advertisements

Elevator pitch

While being a PhD student, many people have asked me about the subject of my thesis and the main ideas behind my research. As a student you always think you have very clear what you are doing, at least until you have to actually explain it to someone who is not related to your domain. In fact, it is about using the right terminology. If you say something like “Oh yeah, I am trying to detect abstractions on scientific workflows semi-automatically in order to understand how they can better be reused and related to each other”, people will look at you as if you didn’t belong to this planet. Instead, something like “detecting commonalities in scientific experiments in order to study how we can understand them bettermight be more appropriate.

But last week the challenge was slightly different. I was invited to give an overview talk about the work I have been doing as a PhD student. And that is not only what I am doing, but why am I doing it and how is it all related without going into the details of every step. It may appear as an easy task, but it kept me thinking more than I expected.

As I think some people might be interested in a global overview, I want to share the presentation here as well: http://www.slideshare.net/dgarijo/from-scientific-workflows-to-research-objects-publication-and-abstraction-of-scientific-experiments. Have a look!

Rohub.linkeddata.es

The last 3 years I have been involved in the Wf4Ever project, which has developed the notion of Research Objects and their respective models (previously introduced another post). Lately I have been exploring new ways for eating my own dog food by associating Research Objects to my papers as HTML web pages (see an example here). These Research Objects are useful, as they serve as summary for the paper in question, and they have pointers to all the datasets, queries and additional materials that I could not include in the paper.

However, I realized that I spent a lot of time creating them and annotating them. Therefore during last Christmas I have created a Research Object Creator tool, which takes as input a LaTeX file and extracts its title and abstract to create an annotated page in rdf-a. It also produces a structure of the contents to reference, so you only have to fill in (and annotate if you want) the resources to point to. A sample can be seen in the image below:

RO
A Sample Research Object generated by the tool

The tool is available in Github, so if you want to try it out with a LaTeX paper click on the following link: https://github.com/dgarijo/Latex2RO.

Finally, I have also created a landing page for showing the current catalog of Research Objects: http://rohub.linkeddata.es/. The page is generated automatically and given a URI of a Research Object, it extracts its title and abstract from the rdf-a descriptions. If you want to contribute with new URIs, modify the Constants file in the Github project (https://github.com/dgarijo/rohub.linkeddata.es/tree/master/src/main/java/com/oeg/rohubweb) and I will recreate the landing page. Note that for this project I have used the Semargl rdf-a parser (http://semarglproject.org/), which is a little bit strict when parsing the HTML pages. If your Research Object has any markup mistakes, the parser will fail.

How to (properly) publish a vocabulary or ontology in the web (part 6 of 6)

And we finally arrive to the last part of the tutorial, which is a set of guidelines on how to reuse other vocabularies (i.e., how your vocabulary should link to other vocabularies). Reuse is not only related to publication, but also to the design of your own vocabulary. As a researcher, everyone knows that it is better not to reinvent the wheel. If an existent vocabulary covers with its terms part of what you want to cover in your competency questions (or system requirements), why should you redefine the same terms again and again?

In order to avoid this issue, you can either import that vocabulary into yours, which will bring the whole imported vocabulary as part of your ontology (like a module), or you could either extend only those properties and classes that you are going to reuse, without adding all the terms of the reused vocabulary as part of your ontology.

Which way is better? It depends: on one hand, I personally like to extend the vocabularies that I reuse when the terms being expanded are not many. Importing a vocabulary often makes it more difficult to present, and for someone loading the ontology, it could be very confusing to browse across many terms not being used in my domain.

Reusing concepts from other ontologies simplifies your domain model, as you import just those being extended.
Reusing concepts from other ontologies simplifies your domain model, as you import just those being extended.

On the other hand, if you plan to reuse most of the vocabulary being imported, for example by creating a profile of a vocabulary for a specific domain, the import option is the way to go.

Importing an ontology makes it on one hand easy to reuse, but on the other (in some cases) it makes your ontology more difficult to understand.
Importing an ontology makes it on one hand easy to reuse, but on the other (in some cases) it makes your ontology more difficult to understand.

Another advice is to be careful with the semantics. I personally don’t like to mess up with the concepts defined by other people. If you need to add your own properties taking as domain or ranges classes defined by other people, you should specialize those classes in your ontology. Imagine an example where I want to reuse the generic concept from the PROV ontology prov:Entity for refering to the provenance of digital entities (which is my sample domain). If I want to add a property that has domain digital entity (like hasSize), then I should specialize the term prov:Entity with a subclass for my domain (in this case digitalEntity subClassOf Entity). If I just assert properties on the general term (prov:Entity) then I may be overextending my property to other domains than those I may have thought, and what is worse: I may be modifying a model which I haven’t defined originally.

But where to start looking if you want to reuse a vocabulary? There are several options:

  • Linked Open Vocabularies (LOV ): A set of common vocabularies that are distributed and organized in different categories. Different metrics for each vocabulary are displayed regarding its metadata and reuse, which will help you to determine whether it is still in unse or not.
  • The W3C standards: When building a vocabulary it is allways good to look up if a standard on that domain already exists!
  • Swoogle and Watson will allow you to search for terms on your domain and suggest you existent approaches.

With this post the tutorial ends. I hope it served to clarify at least a couple of things regarding vocabulary/ontology publication in the web. If you have any questions please leave them on the comments and I’ll be happy to help you.

Do you want more information regarding ontology importing and reuse? Check out these papers (thanks Maria and Melanie for the pointers):

This is part of a tutorial divided in 7 parts:

  1. Overview of the tutorial.
  2. (Reqs addressed A1(partially), A2, A3, A4, P1) Publishing your vocabulary at a stable URI using RDFS/OWL.
  3. (Reqs addressed P2, P3). How to design a human readable documentation.
  4. Extra: A tool for creating html readable documentation
  5. (Reqs addressed P4). Derreferencing your vocabulary
  6. (Reqs addressed A1 (partially)). Dealing with the license
  7. (Reqs addressed A5, P5). Reusing other vocabularies. (This post)

How to (properly) publish a vocabulary or ontology in the web (part 5 of 6)

This week I want to quickly introduce how and why you should include a license in your vocabulary and documentation. Since this subject has been already dealt with, I am mainly going to be providing links to posts describing these matters in detail.

Why should you add a license to your ontologies? Because if others want to reuse your vocabulary or ontology, the license will clarify what are they allowed doing with it according to the law (for instance, if they have to give attribution to your work). Remember that you are the intellectual author and you have the rights over the resource being published. See more details and types of licenses here.

How can you specify a license? You can add it as a semantic description to the ontology/vocabulary. Two widely used properties are dc:rights and dc:license, from the Dublin Core vocabulary. These properties can be used to describe the OWL file being produced, or in the documentation itself with annotations in RDF-a or microdata. See how it can be done here.

Spend some time analyzing which is the most appropriate license for your work. It may help you and many others in the future! If you are confused on which license to use, this is the one which we use on our vocabularies: http://creativecommons.org/licenses/by-nc-sa/2.0/.

This is part of a tutorial divided in 7 parts:

  1. Overview of the tutorial.
  2. (Reqs addressed A1(partially), A2, A3, A4, P1) Publishing your vocabulary at a stable URI using RDFS/OWL.
  3. (Reqs addressed P2, P3). How to design a human readable documentation.
  4. Extra: A tool for creating html readable documentation
  5. (Reqs addressed P4). Derreferencing your vocabulary
  6. (Reqs addressed A1 (partially)). Dealing with the license. (this post)
  7. (Reqs addressed A5, P5). Reusing other vocabularies. (To appear)

How to (properly) publish a vocabulary or ontology in the web (part 4 of 6)

(Update: purl.org seems to have stopped working. I recommend you to have a look at my latest post for doing content negotiation with w3id)

After a long summer break in blogging, I’m committed to finishing this tutorial. In this post I’ll explain why and how to dereference your vocabulary when publishing it in the Web.

But first, why should you dereference your vocabulary? In part 2 I showed how to create a permanent URL (purl) and redirect it to the ontology/vocabulary we wanted to publish (in my case it was http://purl.org/net/wf-motifs). If you followed the example you would have seen that now when you enter the purl you created in the web browser it redirects you to the ontology file. However if you enter http://purl.org/net/wf-motifs you will be redirected to the html documentation of the ontology. When entering the same URL in Protégé, the ontology file will be loaded in the system. By dereferencing the motifs vocabulary I am able to choose what to deliver depending on the type of request received by the server on a single resource: RDF files for applications and nice html pages for the people looking for information about the ontology (structured content for machines, human readable content for users).

Additionally, if you have used the tools I suggested in previous posts, when you ask for a certain concept the browser will take you to the exact part of the document defining it. For example, if you want to know the exact definition for the concept “FormatTransformation” in the Workflow Motif ontology, then you can paste its URI (http://purl.org/net/wf-motifs#FormatTransformation) in the web browser. This makes the life easier for users when browsing and reading your ontology.

And now, how do you dereference your vocabulary? First, you should set the purl redirection as a redirection for Semantic Web resources (add a 303 redirection instead a 302, and add the target URL where you plan to do the redirection). Note that you can only dereference a resource if you control the server where the resources are going to be delivered. The screenshot below shows how it would look in purl for the Workflow Motifs vocabulary. http://vocab.linkeddata.es/motifs is the place our system admin decided to store the vocabulary.

Purl redirection
Purl redirection

Now you should add the redirection itself. For this I always recommend having a look into the W3C documents, which will guide you step by step on how to achieve this. In this case in particular we followed http://www.w3.org/TR/swbp-vocab-pub/#recipe3, which is a simple redirection for vocabularies with a hash namespace. You have to create an htaccess file similar to the one pasted below. In my case the index.html file has the documentation of the ontology, while motif-ontology1.1.owl contains the rdf/xml encoding. If a ttl file exists, you can also add the appropriate content negotiation. All the files are located in a folder called motifs-content, in order to avoid an infinite loop when dealing with the redirections of the vocabulary:

# Turn off MultiViews
Options -MultiViews

# Directive to ensure *.rdf files served as appropriate content type,
# if not present in main apache config
AddType application/rdf+xml .rdf
AddType application/rdf+xml .owl
#AddType text/turtle .ttl #<---Add if you have a ttl serialization of the file

# Rewrite engine setup
RewriteEngine On
RewriteBase /def

# Rewrite rule to serve HTML content from the vocabulary URI if requested
RewriteCond %{HTTP_ACCEPT} !application/rdf\+xml.*(text/html|application/xhtml\+xml)
RewriteCond %{HTTP_ACCEPT} text/html [OR]
RewriteCond %{HTTP_ACCEPT} application/xhtml\+xml [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/.*
RewriteRule ^motifs$ motifs-content/index.html

# Rewrite rule to serve RDF/XML content from the vocabulary URI if requested
RewriteCond %{HTTP_ACCEPT} application/rdf\+xml
RewriteRule ^motifs$ motifs-content/motif-ontology1.1.owl [R=303]

# Rewrite rule to serve turtle content from the vocabulary URI if requested
#RewriteCond %{HTTP_ACCEPT} text/turtle
#RewriteRule ^motifs$ motifs-content/motifs_ontology-ttl.ttl [R=303]

# Choose the default response
# ---------------------------

# Rewrite rule to serve the RDF/XML content from the vocabulary URI by default
RewriteRule ^motifs$ motifs-content/motif-ontology1.1.owl [R=303]

Note the redirections when the owl is being requested. If you have a slash vocabulary, you will have to follow the aforementioned W3C document for further instructions.

Now it is time to test that everything works. The easiest way is just to paste the URI of the ontology in Protégé and in your browser and check that in one case it loads the ontology properly and in the other you can see the documentation. Another possibility is to use curl like this: curl -sH “Accept: application/rdf+xml” -L http://purl.org/net/wf-motifs (for checking that the rdf is obtained) or curl -sH “Accept: text/html” -L http://purl.org/net/wf for the html.

Finally, you may also use the Vapour validator to check that you have done the process correctly. After entering your ontology URL, you should see something like this:

Vapur validation
Vapur validation

Congratulations! You have dereferenced your vocabulary successfully 🙂

This is part of a tutorial divided in 7 parts:

  1. Overview of the tutorial.
  2. (Reqs addressed A1(partially), A2, A3, A4, P1) Publishing your vocabulary at a stable URI using RDFS/OWL.
  3. (Reqs addressed P2, P3). How to design a human readable documentation.
  4. Extra: A tool for creating html readable documentation
  5. (Reqs addressed P4). Derreferencing your vocabulary (this post)
  6. (Reqs addressed A1 (partially)). Dealing with the license. (To appear)
  7. (Reqs addressed A5, P5). Reusing other vocabularies. (To appear)