Prizes for Reproducibility in Audio and Music Research: How we evaluated the entries

We recently announced the results of our first Prizes for Reproducibility in Audio and Music Research. Here we describe how the evaluation was carried out and winners chosen.

Submissions were assessed against three separate criteria:

Ease of reproducibility of the results
Quality of sustainability planning
Potential to enable high quality research in the UK audio and music research community.

The first criterion was applied only if the paper included something to reproduce, such as figures generated using a software program. Some submissions consisted of work such as datasets intended to facilitate reproducible work from later authors: such submissions were assessed on the latter two criteria only. In future calls we will probably use separate categories for works that provide infrastructure for reproducible and sustainable work from others rather than aiming at reproducibility themselves, but here the two were assessed together.

Each of these criteria was assessed by a separate panel, as described below. We then made a shortlist from those submissions which had scored 3 or better on every criterion. (The assessments used different scales, but in each case had low numbers scoring better than high ones.) Shortlisted submissions were assigned to categories according to the type of publication they contained, as listed in the call, and the winner was that with the best average (mean) score in the category.

Here are how the individual criteria were assessed:

Ease of reproducibility

To assess this criterion, we at SoundSoftware attempted to obtain and run the software associated with the paper and regenerate the results shown. This is a straightforward baseline replicability test.

The scale used for this criterion was:

Excellent. A single command or script reproduced the figures in the paper.
Good. It was possible to generate figures like those in the paper, perhaps incomplete or with some adjustment to parameters, but without code changes or author intervention.
Passable. Results were generated but not without effort, for example modifying the code or reverse-engineering how to call it.
Modest. Although we were able to run the code, no means was provided to reproduce the figures in the paper.
Nil. We could not get the code to work.

Sustainability planning

This criterion was assessed by a team at the Software Sustainability Institute: Tim Parkinson (Principal Software Consultant, Software Sustainability Institute and University of Southampton), Arno Proeme (Software Sustainability Institute and EPCC, University of Edinburgh); and Neil Chue Hong (Director of the Software Sustainability Institute). Many thanks to the SSI for their involvement in this work.

The sustainability assessment took into account factors such as whether the code and/or data were stored in a suitable repository, whether version control was used, whether tools for community involvement such as issue trackers and support mechanisms were available, and whether the work was properly licenced. These assessments included commentary and a score on a four-point scale:

Excellent.
Good.
Passable.
Poor.

See the Institute's sustainability evaluation pages for more information about their approach.

Potential to enable high quality research

To assess this criterion, each submission was sent to two external reviewers. Many thanks to the willing reviewers: Tim Crawford; Dan Ellis; Fabien Gouyon; Panos Kudumakis; Piotr Majdak; Alan Marsden; Mark Plumbley; Bob Sturm; and Tillman Weyde. Their reviews included both commentary and a score on a five-point scale:

Very good.
Good.
Fair.
Weak.
Very weak.

When scoring this criterion, we took the average of the two reviewers' scores.

Tweets by @soundsoftwareuk

Recent notes

MLSP Prizes for Reproducibility: Winners announced!

Announcing the winners of the MLSP 2014 and SoundSoftware.ac.uk Prizes for Reproducibility in Signal Processing, organised by SoundSoftware.ac.uk in conjuction with the IEEE Signal Processing Society for the 2014 IEEE International Workshop on Machine Learning for Signal Processing.

SoundSoftware 2014: Videos now available!

The SoundSoftware 2014 workshop, our third annual workshop on software and data in audio and music research, was just as enjoyable as the previous two. Because so much research in this field ends up being expressed through software, a software workshop turns out to be all about the means by which research becomes useful and relevant to people other than the original researchers—fertile ground for interesting and thought-provoking talks.

The workshop videos are now available online at http://soundsoftware.ac.uk/soundsoftware2014, so if you weren't able to make it in person, catch up here!

Our third annual one-day workshop on Software and Data for Audio and Music Research takes place on the 8th of July 2014 at Queen Mary, University of London. The workshop includes talks on issues such as robust software development for audio and music research, reproducible research in general, management of research data, and open access. Read more here, clear your calendar, and register now!