Linking Research

Dagstuhl seminar report: Reproducibility of Data-Oriented Experiments in e-Science

Posted by dgarijov on February 21, 2016

dagstuhl

Dagstuhl Castle, the venue for the seminar

The last week of January I was invited to a Dagstuhl seminar about reproducibility in e-Science, and I think it would be helpful to summarize and highlight some of the results in this blog post. A more detailed report will be published in the next few months; so take this as a sneak peek. If you want to reference any of the figures or tables in the summary, please cite the Dagstuhl report.

So… what are Dagstuhl seminars?

They consist on one week meetings that group together researchers of a community to discuss about a certain topic. The seminars are held in the Dagstuhl school of informatics, near Wadern, a location far from any big city. Basically, the purpose of these seminars is to isolate the participants from the world in order to push forward discussions about the topic at hand.

 What was I doing there?

Discuss, learn take notes and disseminate the work my colleagues and me have been doing! In the Ontology Engineering Group we have carried out several initiatives to promote reproducibility of scientific experiments, ranging from the formalization of protocols to allow detecting missing or inconsistent details to the automatic documentation and publication of workflows, their infrastructure conservation, how to bundle them together with their associated resources into research objects or how to handle their intellectual property rights. You can see the slides I presented during the seminar in this link: http://www.slideshare.net/dgarijo/reproducibility-using-semantics-an-overview.

 The seminar

The seminar was organized by Andreas Rauber, Norbert Fuhr and Juliana Freire, and I think they did a great job bringing people from different areas: Information retrieval, psychology, bioinformatics, etc. It would have been great to see more people from libraries (who have been in charge of preserving knowledge for centuries) and editorials and funding agencies, as in my opinion they are the ones who can really push forward reproducibility by making authors comply with reproducibility guidelines/manifestos. Maybe we can use the outcomes of this seminar to convince them to join us the next time.

Avoiding reproducing previous reproducibility outcomes

To be honest, I was a bit afraid that this effort would result in just another manifesto or set of guidelines for enabling reproducibility. Some of the attendants in the seminar shared the same feeling, and therefore one of the first items of the agenda resulted in summaries of other reproducibility workshops that other participants had attended to, like the Euro RV3 workshop or the Artifact Evaluation for Publication workshop (also held at Dagstuhl!). This helped shape a little bit the agenda and move forward.

Tools, state of the art and war stories:

Discussion is the main purpose of the Dagstuhl seminar, but the organizers scheduled a couple of sessions for each participant to introduce what they had been doing to promote reproducibility. This included specific tools for enabling reproducibility (e.g., noWorkflow, ReproZip, yesWorkflow, ROHub, etc.), updates on the state of the art of a particular area (e.g., the work done by the Research Data Alliance, music, earth sciences, bioinformatics, visualization, etc.) and war stories of participants that had attempted reproducing other people’s work. In general, the presentations I enjoyed the most were the war stories. At the beginning of my PhD I had to reproduce an experiment from a paper, and it may involve some frustration and a lot of work. I was amazed by the work done by Martin Potthast (see paper) and Christian Coldberg (see paper) to actually empirically reproduce the work by others. In particular, Christian maintains a list of the papers he and his group have been able to reproduce. Check it out here.

Measuring the information gain

What do we gain by making an experiment reproducible? In an attempt to address this question, we identified the main elements in which a scientific experiment can be decomposed. Then, we analyzed what would happen if each of these components changed, and how each of these changes relates to reproducibility.

The atomic elements of an experiment are the goals of the experiment, the abstract methods (algorithms, steps) used to achieve the goals, the particular method used to implement the abstract algorithm or sketch, the execution environment or infrastructure used to execute the experiment, the input data and parameter values and the scientists involved in the experiment execution. An example is given below:

  • (R) Research Objectives / Goals: Reorder stars by their size.
  • (M) Methods / Algorithms: Quicksort.
  • (I) Implementation / Code / Source-Code: Quicksort in Java .
  • (P) Platform / Execution Environment / Context : OS, JVM, RAM Memory.
  • (D) Data (input data and parameter values): The dataset X from the Virtual observatory catalog
  • (A) Actors / Persons: Daniel, who design executes the experiment.

The preservation of each these elements of the experiment may change the obtained results. For example, if we change the input data but keep the rest of the parts the same, we ensure the robustness of the experiment (new data may identify new corner cases that were not considered before). If we change the platform successfully but preserve the rest, then we improve the portability of the experiment. In the following table you can see a summary of the overall discussion. Due to time constraints we didn’t alter all the possible columns to represent all possible scenarios, but we represented the ones that are more likely to happen:

Involved Part Change? (0= no change, 1 = change, 0/1 = doesn’t matter)
Research goal 0 0 0 0 0 0 1
Method 0 0 0 0 0 1 0/1
Implementation 0 0 0 0 1 0/1 0/1
Platform 0 0 0 1 0/1 0/1 0/1
Data parameters 0 1 0/1 0 0 0/1 0/1
Input data 0 0 1 0 0 0 0
Actors 0 0/1 0/1 0/1 0/1 0/1 0/1
Information Gain Consistency Robustness\
Sensitivity
Generality Portability\ adoption
Portability\ adoption Independent
val
idation
Repurposability

 

 Decomposing reproducibility

There are three main types of actions that you can take in order to improve the reproducibility of your work. These are proactive actions (e.g., data sharing, workflow sharing, metadata documentation, etc.), reactive actions (e.g., a systematic peer review of the components of your experiment, reimplementation studies, etc.) and supportive actions (e.g., corpus construction for reproducibility, libraries of tools that help reproducibility, etc.). These actions affect three different categories: those which involve paper reproducibility (i.e., individual papers), those which involve improving the reproducibility of groups of papers affecting a particular area of interest (like health studies that recommend a solution for a particular problem) and those which involve the creation of benchmarks that ensure that a proposed method can be executed with other state of the art data.

The following figure (extracted from the report draft) summarizes the taxonomy discussion:

taxonomy

A taxonomy for reproducibility

Actors in reproducibility and guidelines for achieving reproducibility.

Another of the activities I think it’s worth mentioning on this summary is the analysis part of the group did about the different types of authors that participate in one way or the other in reproducibility, along with the obstacles these actors may find in their path.

There are 6 types of actors in reproducibility: those that create contents (authors, lab directors, research software engineers, etc), those that consume the contents (readers, users, authors, students, etc.), those that moderate the contents (editors), those who examine the contents (reviewers, examiners, etc.), those who enable the creation of the contents (funders, lab directors, etc.) and those who audit the contents (policy makers, funders).

For each of the actors, the group discussed checklists that guided them on how to fully achieve the reproducibility of their contents in three different levels: sufficient (i.e., minimum expectation of the actor regarding the demands for reproducibility), better (an additional set of demands which improve the previous ones) and exemplary (i.e., best practices). An example of these checklists for authors can be seen below (extracted from the report):

Sufficient:

  • Methods section – to a level that allows imitation of the work
  • Appropriate comparison to appropriate benchmark
  • Data accurately described
  • Can re-run the experiment
  • Verify on demand (provide evidence that the work was done as described)
  • Ethical considerations noted, clearances listed
  • Conflicts noted, contributions and responsibilities noted
  • Use of other authors’ reproducibility materials should respect the original work and reflect an attempt to get best-possible results from those materials

Better:

  • Black/white box
  • Code is made available, in the form used for the experiments
  • Accessible or providable data

Exemplary:

  • Open-source software
  • Engineered for re-use
  • Accessible data
  • Published in trustworthy, enduring repository
  • Data recipes, to allow construction of similar data
  • Data properly annotated and curated
  • Executable version of the paper; one-click installation and execution

Making a reproducibility paper publishable

Another cool effort aimed to determine whether reproducibility is a means or an end for a publication. Hence, the group discussed if an effort to reproduce an actual research paper would be publishable or not depending on the available resources and the obtained outcome. Generally, when someone intends to reproduce existing work is because they want to repurpose it or reuse it in their experiments. But that objective may be affected, for example, if the code that implemented the method aimed to be reproduced is no longer available. The discussion led to the following diagram, which discusses a set of possible scenarios:

paper-repro

Can reproducibility help you to publish a paper?

In the figure, the red crosses indicate that the effort would not have much value as a new publication. The pluses indicate the opposite, and the number of pluses would affect the target of the publication (one plus would be a workshop, while four pluses would be a top journal/conference publication). I find the diagram particularly interesting, as it introduces another benefit for trying to make reproduce someone else’s experiments.

 Incentives and barriers, or investments and returns?

The incentives are often the main reason why people adopt best practices and guidelines. The problem is that, in the case of reproducibility, each incentive has also an associated cost (e.g., making all the resources available in an open license). If the cost is excessive for its return, then some people might just not consider it worth it.

One of the discussion groups aimed to address this question by categorizing the costs/investments (e.g. artifact preparation, documentation, infrastructure, training, etc.) and returns/benefits (publicity, knowledge transfer, personal satisfaction, etc.) for the different actors identified above (funders, authors, reviewers, etc.). The tables are perhaps too big to include them here (you can have a look once we publish the final report), but in my opinion the important message to take home is that we have to be aware of the cost of reproducibility and its advantages. I have personally experienced how frustrating is to document in detail the inputs, methods and outputs used on a Research Object that expands on a paper that has already been accepted. But then, I have also seen the benefits of my efforts when I wanted to rerun the evaluations several months later, after I had done additional improvements.

 Defining a Research Agenda: Current challenges in reproducibility

Do you want to start a research topic about reproducibility? Here are a few challenges that may help you to get ideas to contribute to the state of the art!:

  1. What are the interventions needed to change of behavior of the researchers?
  2. Do reproducibility and replicability translate in long term impact for your work?
  3. How do we set the research environment for enabling reproducibility?
  4. Can we measure the cost of reproducibility/repeatability/documentation? What are the difficulties for newcomers?

Final thoughts:

In conclusion, I think the seminar was a positive experience. I learnt, met new people and discussed about a topic that is very close to my research area with experts on the field. I think there could be a couple of things that could be improved, like having a better synchronization with other reproducibility efforts taking place in Dagstuhl or having more representation from the publisher and funding agencies side, but I think the organizers will take it into account for future meetings.

Special thanks to Andy, Norbert and Juliana for making the seminar happen. I hope everyone enjoyed as much as I did. If you want to know more about the seminar and some of its outcomes, have a look at the report!

people

Participants of the Dagstuhl seminar

Advertisements

One Response to “Dagstuhl seminar report: Reproducibility of Data-Oriented Experiments in e-Science”

  1. idafen said

    Reblogged this on Idafen SP.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: