Linking Research

Posts Tagged ‘PROV’

Provenance Week 2014

Posted by dgarijov on June 20, 2014

Last week I attended to the Provenance Week in Cologne. For the first time, IPAW and TAPP were celebrated together, even having some overlapping sessions like the poster lighting talks. The clear benefit of having both events at the same time is that a bigger part of the community was actually able to attend to the event, even if some argued that 5 full days of provenance is too long. I got to see many known faces, and finally meet some people who I had just talked to remotely.

In general, the event was very interesting, definitely worth paying a visit. I was able to gather an overview of the state of the art in provenance in many different domains, and how to protect it, collect it and exploit it for various purposes. Different sessions led to different discussions, but I liked 2 topics in particular:

The “Sexy” application for provenance (Paul Groth). After years of discussions we have a standard for provenance, and many applications are starting to use it and extending for representing provenance across different domains. But there is no application that uses provenance from different sources to do something meaningful for the final user. Some applications define metrics that are domain dependent to assess trust, others like PROV-O viz visualize it to see what is going on in the traces, and others try to use it to explain what kind of things we can find in a particular dataset. But we still don’t have the provenance killer app… will the community be able to find it before the next Provenance Week?

Provenance has been discussed for many years now. How come are we still so irrelevant? (Beth Plale). This was brought up by the keynote speaker and organizer Beth Plale, who talked about different consortiums in the U.S. that are starting to care about provenance (e.g., Hathitrust publisher or the Research Data Alliance). As some people pointed out, it is true that provenance has gathered a lot of importance in the recent years, up to the point at which some of the grants will only be provided if the researchers guarantee the tracking of provenance. The standard helps, but we are still far from solving the provenance related issues. Authors and researchers have to see the benefit from publishing provenance (e.g., attribution, with something like PROV-Pingback); otherwise it will be very difficult to convince them to do so.

Luc getting prepared for his introductory speech in IPAW

Luc getting prepared for his introductory speech in IPAW

 

Apart from the pointers I have included above, many other applications and systems were presented during the week. These are my highlights:

Documentation of scientific experiments. A cool application for generating documentations of workflows using python notebook and the prov-o viz. Tested with Ducktape’s workflows.

Reconstruction of provenance: Hazeline Asuncion and Tom de Nies both presented their approaches for finding the dependencies among data files when the provenance is lost. I find this very interesting because it could be used (potentially) to label workflow activities automatically (e.g., with our motif list).

Provenance capture: RData tracker, an intrusive, yet simple way of capturing provenance of scripts in R. Other approaches like no workflow also looked ok, but seemed a little heavier.

Provenance benchmarking: Hugo Firth presented ProvGen, and interesting approach for creating huge synthetic provenance graphs simulating real world properties (e.g., twitter data). All the new provenance datasets were added to the ProvBench Github page, now also in Datahub.

Provenance pingbacks: Tim Lebo and Tom de Nies presented two different implementations (see here and here) for the PROV Pingback mechanism defined in the W3C. Even though security might still be an issue, this is a simple mechanism to provide attribution to the authors. Fantastic first steps!

Provenance abstraction: Paolo Missier presented a way of simplifying provenance graphs while preserving the prov notation, which helps to understand better what is going on in the provenance trace. Roly Perrera presented an interesting survey on how abstraction is also being used to present different levels of privacy when accessing the data, which will be more and more important as provenance gains a bigger role.

Applications of provenance: One of my favorites was Trusted Tiny Things, which aimed at describing everyday things with provenance descriptions. This would be very useful to know, in a city, how much the government spent on a certain item (like statue), and who was responsible for buying it. Other interesting applications were Pinar Alper’s approach for labeling workflows, Jun Zhao’s approach for generating queries for exploring provenance datasets and Matthew Gamble’s metric for quantifying the influence of an article in another just by using provenance.

Trusted Tiny Things presentation

Trusted Tiny Things presentation

The Provenance analytics workshop: I was offered to co-organize this satellite event on the first day. We got 11 submissions (8 accepted) and managed to keep a nice session running plus some discussion at the end. Some ongoing work on applications of provenance to different domains was presented (cloud, geospatial, national climate, crowdsourcing, scientific workflows) and the audience was open to provide feedback. I wouldn’t mind doing it again 🙂

The prov analytics workshop (pic by Paul Groth)

The prov analytics workshop (pic by Paul Groth)

Posted in Conference, Tutorial, Workshop | Tagged: , , , , | 2 Comments »

Elevator pitch

Posted by dgarijov on February 16, 2014

While being a PhD student, many people have asked me about the subject of my thesis and the main ideas behind my research. As a student you always think you have very clear what you are doing, at least until you have to actually explain it to someone who is not related to your domain. In fact, it is about using the right terminology. If you say something like “Oh yeah, I am trying to detect abstractions on scientific workflows semi-automatically in order to understand how they can better be reused and related to each other”, people will look at you as if you didn’t belong to this planet. Instead, something like “detecting commonalities in scientific experiments in order to study how we can understand them bettermight be more appropriate.

But last week the challenge was slightly different. I was invited to give an overview talk about the work I have been doing as a PhD student. And that is not only what I am doing, but why am I doing it and how is it all related without going into the details of every step. It may appear as an easy task, but it kept me thinking more than I expected.

As I think some people might be interested in a global overview, I want to share the presentation here as well: http://www.slideshare.net/dgarijo/from-scientific-workflows-to-research-objects-publication-and-abstraction-of-scientific-experiments. Have a look!

Posted in e-Science, Linked Data, Provenance, Research Object, scientific workflows, Taverna, Tutorial, Wings | Tagged: , , , , , , , , , | Leave a Comment »

The PROV family of specifications is released

Posted by dgarijov on March 13, 2013

The PROV family of documents has been finally released yesterday (March 12) as W3C proposed recommendation (link to the official post) by the Provenance Working Group. This family of documents consists of 4 recommendations and 8 notes that will help you to describe how to model, use and interchange provenance in the Web.

So, where to start? I would recommend you to have a look to the PROV-Overview Note, which describes a high level overview of all the documents in the family and how they are connected together. If you just want to use the model then I would recommend you to take a look at the Primer Note, which explains the functionality of the PROV model with simple examples. The rest of the documents serve different purposes:

  • PROV-O, the PROV ontology (Proposed Recommendation), is an OWL2 ontology allowing the mapping of PROV to RDF.
  • PROV-DM (Proposed Recommendation), the PROV data model for provenance.
  • PROV-N (Proposed Recommendation), a notation for provenance aimed at human consumption.
  • PROV-CONSTRAINTS (Proposed Recommendation), a set of constraints applying to the PROV data model.
  • PROV-XML (Note), an XML schema for the PROV data model.
  • PROV-AQ (Note), the mechanisms for accessing and querying provenance.
  • PROV-DICTIONARY (Note) introduces a specific type of collection, consisting of key-entity pairs.
  • PROV-DC (Note) provides a mapping between PROV and Dublic Core Terms.
  • PROV-SEM (Note), a declarative specification in terms of first-order logic of the PROV data model.
  • PROV-LINKS (Note) introduces a mechanism to link across bundles.

These descriptions were prov:wasQuotedFrom the Overview.
I’ll try to create a post in the next days on how to add simple PROV statements to your web page.

Posted in Provenance | Tagged: , , | Leave a Comment »