While being a PhD student, many people have asked me about the subject of my thesis and the main ideas behind my research. As a student you always think you have very clear what you are doing, at least until you have to actually explain it to someone who is not related to your domain. In fact, it is about using the right terminology. If you say something like “Oh yeah, I am trying to detect abstractions on scientific workflows semi-automatically in order to understand how they can better be reused and related to each other”, people will look at you as if you didn’t belong to this planet. Instead, something like “detecting commonalities in scientific experiments in order to study how we can understand them better” might be more appropriate.
But last week the challenge was slightly different. I was invited to give an overview talk about the work I have been doing as a PhD student. And that is not only what I am doing, but why am I doing it and how is it all related without going into the details of every step. It may appear as an easy task, but it kept me thinking more than I expected.
As I think some people might be interested in a global overview, I want to share the presentation here as well: http://www.slideshare.net/dgarijo/from-scientific-workflows-to-research-objects-publication-and-abstraction-of-scientific-experiments. Have a look!
This week there was an announcement about the deadline extension for BIGPROV13. Apparently, some authors are preparing new submissions for next week. In previous posts I highlighted how the community has been demanding a provenance benchmark to test different analyses on provenance data, so today I’m going to describe how I have been contributing to the publication of public accessible provenance traces from scientific experiments.
It all started last year, when I did an internship in the Information Sciences Institute (ISI) to reproduce the results of the TB-Drugome experiment, led by Phil Bourne’s team in San Diego. They wanted to make accessible the method followed in their experiment in order to be reused by other scientists, for which it is necessary to publish sample traces of the experiment, the templates and every intermediate output and source. As a result, we reproduced the experiment with a workflow using the Wings workflow system, we extended the Open Provenance Model (OPM) to represent the traces as the OPMW profile, and we described here the process necessary in order to publish the templates and traces of any workflow as Linked Data. Lately we have aligned the previous work with the emerging PROV-O standard, providing serializations of both OPM and PROV for each workflow that is published. You can find the public endpoint here, and an exemplar application that loads into a wiki the data of a workflow(dynamically) can be seen here.
I have also been working with the Taverna people in the wf4Ever project to create a curated repository of runs from both Taverna and Wings, compatible with PROV (since both systems are similar and extend the standard to describe their workflows). The repository, available here for anyone that wants to use it, has been submitted to the BIGPROV13 call and hopefully will get accepted.
So… now that we have a standard for representing provenance the big questions are: What do I do with all the provenance I generate? How do I interoperate with other approaches? At what granularity do I record the activities of my website? How do I present provenance information to the users? How do I validate provenance? How do I complete it? Many challenges remain to be solved until we can hit Tim Berners Lee’s OH Yeah? button of every web resource.