Tag: workshop

Deploying sensors between a volcano and a hurricane: The IS-GEO 2018 Summer Institute

Deploying sensors between a volcano and a hurricane: The IS-GEO 2018 Summer Institute

A few weeks ago I had the pleasure to attend the second IS-GEO summer institute, in Hawaii. The meeting was led by Suzanne Pierce and Daniel Fuka, who managed to bring together more than 40 participants with different background and expertise. From the environmental sciences side, we had researchers specialized in areas such as Hydrology, Ecology or Meteorology. From the intelligent systems side, we had experts in sensor handling and deployment, data analytics, data integration, reproducibility and data visualization. We participated together by assembling and deploying hand-made sensors in the 8 different ecosystems of Hawaii’s Big Island. All while the volcano was still active and hurricane Hector approached from the south-east! But let me go step by step.

What is IS-GEO?

For those who are not familiar with the organization, IS-GEO (https://is-geo.org) is a Research Collaboration Network (RCN) funded by NSF through the EarthCube program that aims to bring together researchers from Intelligent Systems (IS) and Geosciences (GEO). The RCN has had presence in top conferences like AGU, where it has led a session for the last couple of years. In addition, we organize (yes, I am part of the RCN as well) monthly teleconferences where we invite an expert from either the IS or GEO side to talk about their latest research. Have a look here: https://is-geo.org/resources/research-presentations/

Exploring NEW research, not just exposing previous work

From the very beginning, the objectives of the event were made clear: collaborate together, define new challenges, enhance communication among attendants and finally define potential robust collaborations. These objectives are key in interdisciplinary conversations, as a fruitful collaboration will only happen when both sides are interested in different aspects of the same research problem. The program was structured so we would have a few presentations during the morning and then spend some times crafting sensors and deploying them in the afternoon. There were different teams with flexible structure and we spent quite some time in the vans when going to the field, which allowed everyone to talk to everyone else about their expertise and interests.

An unexpected guest: Hurricane Hector

Hurricane Hector introduced a change of plans. Instead of deploying just a few initial stations, we decided to prioritize sensor deployment during the first days, and then see if by the end of the week we could visualize actual data that could measure the impact of the hurricane. We had a quick introduction to the different type of sensors, and I was amazed how Arduino and open hardware initiatives have facilitated the integration and reading from them. It makes you want to create a small sensor station at your home!

Daily meteorological reports

One of the perks of being surrounded by scientists is that we had access to the local meteorologist, Harry Halpin, who provided reports of the movement of Hector every day before starting the sessions. Supercool!

It was also thanks to him that we were able to come close to the lava cam (http://lavacam.org/), which reports continuously about the status of the volcano. And take pictures of the recent lava burst, such as this one:

2018-08-07 01.18.21

Building and planting sensors on the field:

We managed to deploy several sensor/weather stations in different ecosystems of the island. Some were deployed in some of the houses of the attendants, such as Dan Fuka’s backyard:

But some others were deployed in further points of the island. In this case, several attendants are setting up a weather station near a Buddhist temple:

2018-08-08 00.25.49

Or near a mountain, on an old lava field. The dome is a research facility to simulate the living conditions in mars:

2018-08-08 23.08.21

Visualizing data:

Once the stations were set, we connected them to the CHORDS platform and visualized them in a map (http://is-geo.chordsrt.com/sites/map). Mike Daniels explained how all data is available for download, and how to set up stations that register new data in CHORDS. Unfortunately, we didn’t have much time to do data analysis, but learning about the data acquisition process is a valuable lesson for all the attendants. Collecting data requires hard work, and integrating and visualizing it to make it useful is full of challenges, from sensor calibration to error detection.

Personal collaboration outcomes and takeaways:

I wish there were more events like this one, combining a potential asset to the community (a data product that reflects the impact of Hector and future hurricanes) with hands on sessions that explain how to create, program and collect data from sensors. I now appreciate more the amount of work that goes into the data collection process. Furthermore, I hadn’t created any circuit for a long time. It’s always good to refresh your memory with some Arduino hands on.

During the week, I also had the chance of collaborating with Suzanne Pierce and Daniel Hardesty-Lewis to create workflows for groundwater modeling. In fact, we were able to describe in a machine readable manner how to invoke Modflow with different recharge files and add it to a model registry. With a little more effort, we will be able to also connect this work to data collected by a platform such as CHORDS.

Other highlights:

  • Planet Texas 2050 is going to be big! Created after hurricane Harvey’s disaster, this project is going to deploy a whole new cyberinfrastructure to study climate variations, track the impact of pumping and trying to predict how to irrigate different regions. I hope we can collaborate within our MINT project (see last bullet point)
  • The Ronin Institute looks like a great organization to apply for research grants when you can’t change locations.
  • IkeWai, like Planet Texas 2050, will set up a sensor infrastructure for data analysis. Many opportunities for data analysis!
  • Privacy and sensor data: There are many open questions on who should own sensor data when installed on private property. On the one hand, sensors could give away personal information about the owner. On the other hand, sensors could be exploited to detect illegal activities, such as water pollution.
  • Grafana is looking great for sensor data visualization. You can even configure alerts!
  • [Self promotion :D] I gave a presentation and demo on our Model INTegration (MINT) project, where we are trying to bring together models from economy, agronomy, hydrology and meteorology to answer important questions about a region. The project is only 6 months old, but so far we are doing great progress! See our IEMSs paper for a full description of MINT!

General guidelines for reviewing a scientific publication

Lately I’ve been asked to do several revisions in different workshops, conferences and journals. In this post I would like to share with you a generic template to follow when reviewing a scientific publication. If you have been doing it for a while you may find it trivial, but I think it might be useful for people that have started recently in the reviewing process. At least, when I started, I had to ask for a similar one to my advisor and colleagues.

But first, several reasons why you should review papers:

  • Helps you to identify whether a scientific work is good or not. And refine your criteria by comparing yourself with other reviewers. Also, it trains you to defend your opinion based on what you read.
  • Helps you refining your own work, by identifying common flaws that you normally don’t detect when writing your own papers.
  • It’s an opportunity to update your state of the art, or learn a little on other areas.
  • Allows you contributing to the scientific community, and getting public visibility.

A scientific work might be the result of months of work. Even if you think it is trivial you should be methodic explaining the reasons why you think it should be accepted or rejected (yes, even if you think the paper should be accepted). A review should not be just an “Accepted” or “Rejected” statement, but also contain valuable feedback for the authors. Below you can see the main guidelines for a good review:

  • Start your review with an executive summary of the paper: this will let the authors know the main message you have understood from their work. Don’t copy and paste the abstract; try to communicate the summary in your own words. Otherwise they’ll just think you didn’t put much attention in reading the paper.
  • Include a paragraph summarizing the following points:
    1. Grammar: Is the paper well written?
    2. Structure: is the paper easy to follow? Do you think the order should have been different?
    3. Relevance: Is the paper relevant for the target conference/journal/workshop?
    4. Novelty: Is the paper dealing with a novel topic?
    5. Your decision. Do you think the work should be accepted for the target publication? (If you don’t, expand your concerns in the following paragraphs)
  • Major Concerns: Here is where you should say why do you disagree with the authors, and highlight your main issues. In general, a good research paper should describe successfully four main points:
    1. What is the problem the authors are tackling? (Research hypothesis) This point is tricky, because sometimes it is really hard to find! And in some cases the authors omit it and you have to infer it. If you don’t see it, mention it in your review.
    2. Why is this a problem? (Motivation). The authors could have invented a problem which had no motivation. A good research paper is often motivated by a real world problem, potentially with a user community behind benefiting from the outcome.
    3. What is the solution? (Approach). The description of the solution adopted by the authors. This is generally easy to spot on any paper.
    4. Why is it a good solution? (Evaluation). The validation of the research hypothesis described in point one. The evaluation is normally the key of the paper, and the reason why many research publications are rejected. As my supervisor has told me many times, one does not evaluate an algorithm or an approach; one has to evaluate whether such proposed algorithm or approach validate the research hypothesis.

When a paper has the previous four points well described, it is accepted (generally). Of course, not all papers enter the category of a research papers (like a survey paper or an analysis paper). But the four previous points should cover a wide range of publications.

  • Minor concerns: You can point out minor issues after the big ones have been dealt with. Not mandatory, but t will help the authors to polish their work.
  • Typos: unless there are too many, you should point the main typos you find in your review. Or the sentences you think are confusing.

Other advice:

  • Don’t be a jerk: many reviews are anonymous, and people tend to be crueler when they know their names won’t be shown to the authors. Instead of saying that something “is garbage”, state clearly why you disagree with the authors proposal and conclusions. Make the facts talk for themselves; not your bias or opinion.
  • Consider the target publication. You can’t use the same criteria for a workshop, conference or journal. Normally people tend to be more permissive at workshops, where the evaluation is not that important if the idea is good, but require a good paper for conferences and journals.
  • Highlight the positive parts of the authors’ work, if any. Normally there is a reason why the authors have spent time on the presented research, even if the idea is not very well implemented.
  • Check the links, prototypes, evaluation files and in general, all the supplementary material provided by the authors. A scientist should not only review the paper, but the research described on it.
  • Be constructive. If you disagree with the authors in one point, always mention how they could improve their work. Otherwise they won’t know how to handle your issue and ignore your review.

If you want to check more guidelines, you can check the ones Elsevier gives to their reviewers, or the ones by PLOS ONE.

E-Science 2014: The longest Journey

E-Science 2014: The longest Journey

After a few days back in Madrid, I have finally found some time to write about the eScience 2014 conference, which took place last week in Guarujá, Brasil. The conference lasted for 5 days (the first two days with workshops), and it got attendants from all over the world. It was especially good to see many young people who could attend thanks to the scholarships awarded by the conference, even when they were not presenting a paper. I found a bit unorthodox that the presenters couldn’t apply for these scholarships (I wanted to!), but I am glad to see this kind of giveaway. Conferences are expensive and I was able to have interesting discussions about my work thanks to this initiative. I think this is also a reflection of Jim Gray’s will: pushing science into the next generation.

We were placed in touristic resort in Guarujá, at the beach. This is what you could see when you got out of the hotel:

Guarujá beach
Guarujá beach

And the jungle was not far away either. After a 20 minute walk you were able to arrive at something like this…

The jungle was not far from the beach either
The jungle was not far from the beach either

…which is pretty amazing. However, the conference schedule was packed with interesting talks from 8:30 to 20:30 most of the days, and in general we were unable to do some sightseeing. In my opinion they could have reduced one workshop day and relax the schedule a little bit. Or at least remove the parallel sessions in the main conference. It always sucks to have to choose between two different interesting sessions. That said, I would like to congratulate everyone involved in the organization of the conference. They did an amazing job!

Another thing that surprised me is that I wasn’t expecting to see many Semantic Web people, since the ISWC Conference occurred at the same time in Italy, but I found quite a few. We are everywhere!

My talks at the conference were two, which summarized the results I achieved during my internship at the Information Sciences Institute earlier this year. First I presented a user survey quantifying the benefits of creating workflows and workflow fragments and then our approach to detect automatically common workflow fragments, tested in the LONI Pipeline (for more details I encourage you to follow the links to the presentations). The only thing that bothered me a bit was that my presentations were scheduled at strange hours. I had the last turn before the dinner for the first one, and then I was the first presenter the last day at 8:30 am for the second one. Here is a picture of the brave attendants who woke up early the last day, I really appreciated their effort :):

The brave attendants that woke up early to be at my talk at 8:30 am
The brave attendants that woke up early to be at my talk at 8:30 am

But let’s get back to the workshop, demos and conference. As I introduced above, the first 2 days included workshop talks, demos and tutorials. Here are my highlights:

Workshops and demos:

Microsoft is investing on scientific workflows!: I attended the Azure research training workshop, were Mateus Velloso introduced the Azure infrastructure for creating and setting up virtual machines, web services, webs and workflows. It is really impressive how easily you are able to create and run experiments with their infrastructure, although you are limited to their own library of software components (in this case, a machine learning library). If you want to add your own software, you have to expose it as a web service.

Impressive visualizations using Excel sheets at the Demofest! All the demos belonged to Microsoft (guess who was one of the main sponsors of the conference) although I have to admit that they looked pretty cool. I was impressed by two demos in particular, the Sanddance beta and the Worldwide Telescope. The former is used to load Excel files with large datasets to play with the data, select, filter and plot the resources by different facets. Easy to use and very fluid in the animations. The latter was similar to Google Maps, but you were able to load your excel dataset (more than 300K points at the same time) and show it on real time. For example, in the demo you could draw the itineraries of several whales in the sea at different points in time, and show their movement minute after minute.

Microsoft demo session. With caipirinhas!
Microsoft demo session. With caipirinhas!

New provenance use cases are always interesting. Dario Oliveira introduced their approach to extract biographic information from the Brazilian Historical Biographical Dictionary at the Digital Humanities Workshop. This included not only the life of the different persons collected as part of the dictionary, but also each reference that contributed to tell part of the story. Certainly a complex and interesting use case for provenance, which they are currently refining.

Paul Watson was awarded with the Jim Gray Award. In his keynote, he talked about the social exclusion and the effect of digital technologies. Having a lack of ability to log online may stop you from having access to many services, and ongoing work on helping people with accessibility problems (even through scientific workflows) was presented. Clouds play an important role too, as they have the potential for dealing with the fast growth of applications. However, the people who could benefit the most from the cloud often do not have the resources or skills to do so. He also described e-Science Central, a workflow system for easily creating workflows in your web browser, with provenance recording and exploring capabilities and the possibility to tune and improve the scalability of your workflows with the Azure infrastructure. The keynote ended by highlighting how important is to make things fun for the user (“gamification “ of evaluations, for example), and how important eScience is for computer science research: new challenges are continuously presented supported by real use cases in application domains with a lot of data behind.

I liked the three dreams for eScience of the “strategic importance of eScience” panel:

  1. Find and support the misfits, by addressing those people with needs in escience.
  2. Support cross domain overlap. Many communities base their work on the work made by other communities, although the collaboration rarely happens at the moment.
  3. Cross domain collaboration.
First panel of the conference
First panel of the conference

Conference general highlights:

Great discussion in the “Going native Panel”, chaired by Tony Hey, with experts from chemistry, scientific workflows and ornithology (talk about domain diversity). They analyzed the key elements of a successful collaboration, explaining how in their different projects they have a wide range of collaborators. It is crucial to have passionate people, who don’t lose the inertia after the grant from the project has been obtained. For example, one of the best databases for accessing chemicals descriptions on the UK came out from a personal project initiated by a minority. In general, people like to consume curated data, but very few are willing to contribute. In the end what people want is to have impact. Showing relevance and impact (or reputation, altmetrics, etc.) will grant additional collaborators. Finally, the issue of data interoperability between different communities was brought up for discussion. Data without methods is in many cases not very useful, which encourages part of the work I’ve been doing during the last years.

Awesome keynotes!! The one I liked the most was given by Noshir Contractor, who talked about “Grand Societal Challenges”. The keynote was basically about how to assemble a “dream team” of people for delivering a product/proposal, and all the analyses that had been done to determine which factors are the most influential. He started by talking about the Watson team, who built a machine capable of beating a human on TV, and continued by presenting the tendencies people have when selecting people for their own teams. He also presented a very interesting study of videogames as “leadership online labs”. In videogames very heterogeneous people meet, and they have to collaborate in groups in order to be successful. The takeaway conclusion was that diversity in a group can be very successful, but it is also very risky and often it ends in a failure. That is why people tend to collaborate with people they have already collaborated with when writing a proposal.

The keynote by Kathleen R. McKeown was also amazing. She presented a high level overview of the work in NLP developed in their group concerning summarization of news, journal articles, blog posts, and even novels! (which IMO has a lot of merit without going into the detail). She presented co-reference detection of events, temporal summarization, sub-event identification and analysis of conversations in literature, depending on the type of text being addressed. Semantics can make a difference!

New workflow systems: I think I haven’t seen an eScience conference without new workflow systems being presented 😀 In this case the focus was more on the efficient execution and distribution of the resources. Dispel4py and Tigres workflow systems were introduced for scientists working in Python.

Cross domain workflows and scientific gateways:

Antonella Galizia presented the DRIHM infrastructure to set up Hydro-Meteorological experiments in minutes. Impressive, as they had to integrate models for meteorology, hydrology, pluviology and hydraulic systems, while reusing existent OGC standards and developing a gateway for citizen scientists. A powerful approach, as they were able to do flooding predictions on in certain parts of Italy. According to Antonella, one of the biggest challenges on achieving their results was to create a common vocabulary which could be understood by all the scientists involved. Once again we come back to semantics…

Rosa Filgueira presented another gateway, but for vulcanologists and rock physicists. Scientists often have problems to share data among different disciplines, even if they belong to the same domain (geology in this case). This is because every lab often records their data in a different way.

Finally, Silvia Olabarriaga gave an interesting talk about workflow management in astrophysics, heliophysics and biomedicine, distinguishing the conceptual level (user in the science gateway), abstract level (scientific workflow) and concrete level (how the workflow is finally executed on an infrastructure), and how to capture provenance at these different granularities.

Other more specific work that I liked:

  • A tool for understanding the copyright in science, presented by Richard Hoskings. A plethora of different licenses coexist in the Linked Open Data, and it is often difficult to understand how one can use the different resources exposed in the Web. This tool helps on guiding the user about the possible consequences of using a given resource or another in their applications. Very useful to detect any incompatibility on your application!
  • An interesting workflow similarity approach by Johannes Starlinger, which improves the current state of the art by making efficient matching on workflows. Johannes said they would release a new search engine soon, so I look forward to analyzing their results. They have published a corpus of similar workflows here.
  • Context of scientific experiments: Rudolf Mayer presented the work made on the Timbus project to capture the context of scientific workflows. This includes their dependencies, methods and data under a very fine granularity. Definitely related to Research Objects!
  • An agile annotation of scientific texts to identify and link biomedical entities by Marcus Silva, with the particularity of being capable of loading very large ontologies to do the matching.
  • Workflow ecosystems in Pegasus: Ewa Deelman presented a set of combinable tools for Pegasus able to archive, distribute simulate and re-compute efficiently workflows. All tested with a huge workflow in astronomy.
  • Provenance is still playing an important role in the conference, with a whole session for related papers. PROV is being reused and extended in different domains, but I still have to see an interoperable use across different domains to show its full potential.
Conference dinner and dance with a live band
Conference dinner and dance with a live band

In summary, I think the conference has been a very positive experience and definitely worth the trip. It is very encouraging to see that collaborations among different communities are really happening thanks to the infrastructure being developed on eScience, although there are still many challenges to address. I think we will see more and more cross domain workflows and workflow ecosystems in the next years, and I hope to be able to contribute with my research.

I also got plenty of new references to add to the state of the art of my thesis, so I think that I also did a good job by talking to people and letting others know of my work. Unfortunately my return flight was delayed and I missed my connection back to Spain, converting my 14 hour flight home to almost 48 hours. Certainly the longest journey from any conference I have assisted to.

The Beyond of the pdf Workshop

The Second Beyond the PDF workshop has finally taken place last week in Amsterdam (fortunately I got travel support from the organizers, so I was able to attend the full event). If I have to pick a word to describe the workshop, it would be “different”. As Paul Groth (one of the chairmans) summarizes in his post, the audience was heterogeneous: there were people from biomedical, humanities, social sciences and physical sciences domains, belonging to different types of organizations (ranging from academics to governmental). Publishers and editorials were also present, and many different tools, visions and ideas were presented to improve the future of scholarship communication. This whole context was a bit different to what one could be used to see in other conferences, where you find people doing similar things to what you do, and you discuss your research rather than the idea of how to communicate it to others. Here people were not afraid to tell publishers and editors why they thought the system was broken, exposing their arguments in a non-formal friendly environment.

Another interesting fact was the “second screen” showing the twitter wall live. People were very active, highlighting the interesting quotes from the talks and initiating debates in parallel to all the sessions. Even today the tag #btpdf2 is still active. Congrats to all the organizing staff!

While the speakers were exposing, some artist were drawing the Beyond the PDF wall
While the speakers were exposing, some artist were drawing the Beyond the PDF wall

Detailed summary and highlights

The program of the workshop is available here. Below you can see the summary and highlights from the different sessions and interesting quotes I wrote down in my notes.

Day 1:

The day started with a Keynote by Kathleen Fitzpatrick, who explained how the book is not dead, although the academic book is kind of dying. The blog could be a replacement, since it is a kind of alternative way to publish the resources. You are able to get comments from the community, feedback suggestions and support. Why couldn’t we be our own publishers?

The current reviewing process has concerns; could it be part of what is broken? Bias and flaws is not unusual, and reviewing requires a great labor for which we normally don’t receive much credit. As an example, she explained how the book she had been writing had more impact in a blog form than in its final published format.

Finally, she remarked how important the online communities are. If you build a tool or a service without a community, people will not just come. You have to build a community first. Some interesting quotes: “Publishers will have to focus more on services and less on selling digital objects”. “We need filters, not gatekeepers” (referring to publishers and editors). “The network is not a threat. It helps to reach more people
Laura Czerniewicz and Michelle Willmers followed the keynote with a session on context. They highlighted the dangers of a complete open access: will it become a flooding of content? There is a need for a rewarding system. What do authors get from open access? Editors are gatekeepers. Another important factor is that in the end only the Journal articles are considered when judging the validity of a researcher. Tweets, blogs, talks, workshops and conferences are ignored, even when they could have had more impact than the actual journals. In most cases journal articles are the peak of the iceberg.

Next, on the Vision session, Nathan Jenkins introduced Authorea, a very cool tool to build articles online without having to deal with the Latex compilation and built on Ruby on Rails. Mercé Crossas presented Dataverse, a portal for archiving data results for citation purposes, motivated by the volatility of the links in old papers. Amalia S. Levi explained how in historical research a lot of the data already existed, but the links were missing. (This reminds me of some conversations that I’ve had recently about how the papers are cited in the scientific community. It turns out that sometimes this is the case nowadays as well). Joost Kircz hit the spot in his speech (in my opinion): Are we going Beyond the pdf or Beyond the essay? An enhanced pdf is still stuck on the page paradigm. Papers represent structured or randomized knowledge that should be browsed, and that is often not possible in a book. I liked his ending statement: “Publishing is not a science, but is a craft”. Lisa Girard followed with StemBook, a portal where all the authors could keep their findings up to date, allowing the community to review their work in stem cell biology. An interesting thing about it is that people could upload their protocols and annotate them using Domeo, aligned with the Annotation Ontology. Paolo Ciccarese followed providing an overview of that ontology, summarizing their efforts and collaboration in the community in order to come up with a highly adopted standard.

As a small comment to this session, I think it is a bit curious that so many finished (or nearly finished) tools were presented in a “Vision” session. It would have been interesting to see how some of the presenters picture the future of publication and how to get there (either by using some of the presented tools or not).

After lunch there was a session on new models for content dissemination, where Theodora Bloom started stating very clearly what the main current problems are for dissemination:

  1. Access to what you want to read and use
  2. Publication venue as a measure of quality.
  3. Having to repeat the cycle of publication in different journals
  4. Poor links for underlying data.

She also explained how in Plos One the research leading to negative results is also published, but hardly anyone submits. I really liked this, it reminded me of a quote from Thomas Edison: “I have not failed. I’ve just found 10,000 ways that won’t work”. If an idea looks promising but doesn’t work as expected, it’s important to share it with the community so as to avoid someone else to repeat the same mistake. Who knows, it might even inspire other people to come up with a better solution.

Brian Hole followed talking about metajournals and the social contract of science, combining it in the idea of an Ultrajournal.

The second part of the session was introduced by a lively Jason Priem, who talked about how the printing press had been the first revolution for disseminating content and the Internet the second one. According to him, we should mine the network in order to produce the appropriate filters for the information. Keith Collier followed introducing Rubriq, an independent peer- review system that aims to decouple the peer review from the publication. Next, Kaveh Bazargan showed the current concern about type setters, and how we should get rid of them. Instead, XML or blog post should be the current type setters, giving more freedom to the writer. Finally Peter Bradley talked about Hypothes.is, an open source platform for the evaluation of information, and Alf Eaton introduced PeerJ, an open access peer reviewed journal with metadata for all their papers.

The final session of the day was about the business case, where three representatives explained different business models and three stakeholders plus the audience asked questions about them.  Wim van der Stelt argued that in Springer they are not resisting to the change and Mark Hahnel defended the authors to be able to receive credit for their data as it happens in FigShare. The discussion brought some interesting topics to the table, such as that scholarly communication per se is not profitable and we need government funding, how to move from impact factor in journals to one that is meaningful (and convince the government to support it) or how to be able to share our work to those that don’t have the means to afford to pay it. Another important observation is the number of hours spent by researchers in rejected per year, which sums up to 11-16 millions!

The day ended with the session on demos and posters. Marco Roos and Aleix Garrido were by my side talking about the wf4ever project, while I spoke a bit about the work done reproducing the TB-Drugome workflow. The slides can be seen here.

Day 2

Carol Teinoir started the day by trying to analyze and understand the needs of scholars. She gave a lot of metrics about the main reasons for scholars to not share their data (“I have not the time”, or “I’m not required to” were among the top five), and how successful researchers turn up to read more. She also gave metrics on who is sharing data versus who is willing to share their data, and analyzed how the e-books had influenced the printed pdf copies. An interesting fact: in Australia, e-books have almost replaced written copies.

The “Making it happen” session was next. Asunción Gómez Pérez talked about the SEALS evaluation platform, which allows reproducing the different tests of an experiment automatically. Graeme Hirst spoke about usability, the “neglected dimension” and how we are “forced” to use low usable systems like Word and Latex. The gain should be greater than the pain when writing a paper.Rebecca Lawrence followed talking about data review and how to share data: the requirement of a data sharing plan, how things should be done according to standards, where do we find the funding for the previous 2, how we should refuse the papers where data is not accessible, and how a reviewer should have access to all the materials in order to properly review the paper.

Asun Gómez Pérez talking about Seals
Asun Gómez Pérez talking about the SEALS platform

The session finished with several short presentations that can be accessed here. Anita de Waard insisted on the idea of the need of a new rewarding system, although no further details were given. I also liked the talk by Melissa Haendel on reproducibility on science, even if she didn’t talk about the role of scientific workflows in reproducibility. Another interesting tool was ORCID, a registry for scholars with author disambiguation. Gully Burns ended the session analyzing how the different parameters change an experiment.

We broke out in different sessions during lunch. I went to the reproducibility, where we shared the different issues that currently exist for trying to store and rerun experiments. However, unlike the data citation group we didn’t come up with a manifesto.

The next session dealt with the new models for evaluation of research, where the organizer, Carole Goble, proposed a little role play. Each of the 6 participants wore a different hat representing the role of their institution. Phil Bourne was the institutional dean (officer hat), Victoria Stodden (with the typical English bureaucratic hat on the right of the picture) represented the public funding agencies, Christine Borgman represented the digital libraries (second hand cowboy hat), Jan Reichelt with the “cool” hat on the left represented the commercial funders; Scott Edmunds representing publisher role with a top hat (unfortunately he wasn’t wearing it in the picture) and Steve Pettifer represented the academic role, (can’t be seen properly on the picture).

Roles in academic funding and dissemination of research
Roles in academic funding and dissemination of research

The summary of the discussion was as follows, for each role:

  • Funding agencies: they are not interested in the evaluation of the academic research. It should be driven by the community.
  • The dean: I’ll quote the acting by Phil:

“Oh, we have produced a 200 page report about the possible changes that we could do to the system.

–  And what are you going to change?

– Very little!”

It’s events like this one the ones that provide the new ideas.

  • Publishers and academia: death to impact factor.
  • Commercial funders: code and methods matters. They should be brought as first class citizens (I couldn’t agree more).
  • Digital libraries: The standards are problematic. Tools don’t connect, and interoperability is an issue.

The final session, Visions for the future, grouped a set of flash talks from very different people. The most successful ones were given by Carole Goble (winner), who compared the publication of data from a software engineering perspective, and how we could do several releases of the data as happens in software releases: “Don’t publish, release!”; Stian Haklev with his proposal to create an alternative for Google Scholar (I liked his answer to Ed Hovy, when he asked what was new in his proposal: “There is nothing new about this, and that is precisely what is new, that we are just able to make it”); Jeffrey Lancaster with his proposal to change the CSL citation styles and Kaveh Bazargan, who demanded the publishers to release the XML of the papers instead of the pdf. The job of a publisher should be to disseminate content, and not to dictate us how to read the papers. He even did an online demo of a tool that could show the pdf in several different ways depending on the user preferences from the XML.

I also found interesting the proposal by Alejandra Gonzalez-Beltran, who talked about isa-tools, a platform used by pharmaceutical companies for the collection, curation and reuse of datasets; and of course the idea of Olga Giraldo, who wants to provide the means to transform laboratory protocols as nanopublications and provide checklist to organize them properly. Below you can see a picture of the participants in the session:

Participants of the "Make it Happen" session.
Participants of the “Make it Happen” session.

And that’s all! I think that in summary it was a nice event with a lot of discussion and claims from academia to editors, publishers and funding agencies. Of course, I guess that part of the motivation of the workshop is for them to take ideas on how the system could be changed plus a state of the art of different tools and platforms that they could incorporate to their systems.

Results, next steps?

There was a lot of debate but no session for what the next steps should be. I think this would have been an interesting thing to have, although it is difficult to have it all in a 2-day event. As results, part of the people participating in the breakout sessions wrote the “Data citation manifesto”, which I would really like people to follow in order to give credit for their data (link here, please share!).

Also the idea of an open Google Scholar (as an open alternative such as open Street maps is to Google Maps) looks promising. I hope it gets implemented!

And finally, some personal thoughts. After attending the event I realized that as a computer scientist working to enable reproducibility and reusability of other people’s work, sometimes in my own area we don’t follow the reproducibility principles: papers about tools that are not available after a while, published algorithms without an implementation, , unstable links, etc. I have always tried to include a reference to the code and evaluations done in my work for the reviewers to access it, but I might start using some of the tools shown in the workshop for the sake of preservation.

The map with the main topics discussed on day 2
The map with the main topics discussed on day 2