Month: February 2013

Annotating your personal page with RDF-a

A couple of weeks ago some members of the OEG and me organized a small tutorial about RDF-a  to the rest of the group (also known as the First OEG RDF-a Collaborative Tripleton). The final goal was to provide an overview and eat our own dog food by annotating our personal web pages with some simple RDF-a statements. The bait, some free pizza:

Participantes enjoying their pizza. It always works
Participants enjoying their pizza. It always works

People were very participative, and we discussed some examples during the tutorial. Given the fact that nobody was an expert in RDF-a, I think that the overall experience was very useful for everyone.
Therefore, if you want to annotate your page with some RDF-a statements, I have prepared a small guideline below listing the main common steps to take into consideration. The guidelines are based on what we discussed on the tutorial and later:

1)      Distinguish your web page from yourself: Don’t use the URL of your home page as your URI. Instead, create a URI for yourself. For example, my personal page is: If I want to describe the page (title, creation date, etc), I would use that URL. If I want to add some descriptive statements about myself (name, email, phone, etc.) then I can use This is a recognized good practice, although you can use any identifier for yourself as long as you control the domain where you create it. Another could have been:

2)      Provide at least a minimum set of statements about yourself: If you provide some information in html for users, why not in RDF-a for machines? Add your name, an image, phone, email, institution, a link to your publications, the institutions you are working for, past and present projects, etc.

3)      Use widely used vocabularies like schema and foaf for describing yourself, Dublin Core to describe the document and, if you want to state the provenance of the document itself, you may even use the PROV standard.

4)      Try to use existent authoritative URIs for the resources you are describing. Linking to other resources is always better than creating your own URIs. If you don’t know the URI for an institution or a project you can always create your own and add an owl:sameAs once you know the good one. But you can try looking up in DBpedia or Sindice for existent ones.

5)      Validate your RDF-a! Before publishing, be sure to test the statements you have produced with an RDF validator like this one.

Do you want to know more? Check out the RDF-a Primer! It’s full of examples and it is very easy to follow.

Towards a user friendly SPARQL

One of the main difficulties for using the SPARQL query language is that you have to be familiar with the ontologies or vocabularies in which the data is modeled for performing a query against an endpoint. A couple of weeks ago Machinalis released a new framework called Quepy, which is able to compose SPARQL queries from natural language. A snapshot with an example can be seen below, where I asked for information about Quentin Tarantino:

Snapshot of the application

The proposal is very simple: you write your query and then you can test it against DBpedia, the most popular dataset in Linked Data. The types of queries supported at the moment are limited (simple statements taken from the most popular vocabularies), but I guess they will improve in the near future.

Quepy aims to fill the gap between users and the technology, which in my opinion is precisely what we should do in order to make people start using semantic technologies. However, the release of the framework generated some discussion in the lists. Some people didn’t find this useful, as they argued that the results obtained from the SPARQL queries generated by Quepy were not as accurate as those obtained by Google or Bing. This is true, although I don’t think it’s fair to compare a new released (and limited) framework with the results of two of the biggest companies in the market. Furthermore, the results depend on the dataset on which the queries are performed. Other datasets might give more accurate results, which is part of the beauty of the approach: you don’t depend on a single dataset.

Quepy is a nice initiative, but much work remains to be done in order to map complex natural language queries to languages like SPARQL. The potential of this kind of tool is clear, as it can provide an exact response to what the user is looking for, contrasting it against several distributed databases instead of independent silos of information (like Bing or Google). In our group some people are researching on this topic as well, by analyzing the most typical patterns performed when querying datasets in Linked Data. Mixing natural language processing  and common pattern approaches could improve the amount of queries covered by each system separately.