At the occasion of the IEEE CloudCom 2016 conference I organized with Prof. Pascal Bouvry in Luxembourg, I prepared a tutorial to offer a set of practical sessions around selected tools we consider relevant for the many challenges opened by Reproducible Research:

  • Accurate, organized and easy-to read / take /share} Docs
  • Sharing Code and Data
  • Mastering your environment clean and automated by:
    • Using common building tools \hfill{make, cmake etc.}
    • Using a constrained environment * Sandboxed Ruby/Python,Vagrant, Docker
    • Automate its building through cross-platform recipes
    • Automatically test your recipes for Environment configuration

Title: Reproducible Research at the Cloud Era

   Online Tutorial: RR-tutorials.readthedocs.io

   Download the slides (PDF)

Abstract:

The term Reproducible Research (RR) refers to “the idea that the ultimate product of academic research is the paper along with the full computational environment used to produce the results in the paper such as the code, data, etc. that can be used to reproduce the results and create new work based on the research.” Source: Wikipedia.

The need for reproducibility is increasing dramatically as data analyses become more complex, involving larger datasets and more sophisticated computations. Obviously, the advent of the Cloud Computing paradigm is expected to provide the appropriate means for RR. This tutorial is meant to provide an overview of sensible tools every researcher (in computer science but not only) should be aware of to enable RR in its own work. In particular, and after a general talk presenting RR and the existing associated tools and workflow, this tutorial will propose several practical exercises and hands-on meant to be performed on each attendee’s laptop, to cover the management of sharable Development environment using Vagrant. Resources of this tutorial will be available on Github.

Topics

  • Overview of Reproducible Research (RR) and Open Challenges
  • Relevant Tools for RR: git, make & Co., knitr, continuous integration using Travis/Gitlab-CI.
  • creation and configuration of lightweight, reproducible and portable environments using Virtual Machines
  • installation, configuration and generation of Vagrant boxes
  • Box Provisioning using puppet
  • Vagrant providers

Level: beginner - advanced

See Also: your primary source of information if you are interested to know more about Reproducible Research is the excellent Series of Webinars on Reproducible Research organized by Arnaud Legrand and his colleagues from CNRS, Inria, University of Grenoble, ENS etc.:

Actually, part of the material proposed on this tutorial comes from this source, and I would like to thank again Arnaud for allowing me to do it.

Other resources you might be interested to check:

Credits for the pictures: Valentin Plugaru