Reproducible Research at the Cloud Era
At the occasion of the IEEE CloudCom 2016 conference I organized with Prof. Pascal Bouvry in Luxembourg, I prepared a tutorial to offer a set of practical sessions around selected tools we consider relevant for the many challenges opened by Reproducible Research:
- Accurate, organized and easy-to read / take /share} Docs
- Sharing Code and Data
- Mastering your environment clean and automated by:
Title: Reproducible Research at the Cloud Era
Online Tutorial: RR-tutorials.readthedocs.io
Abstract:
The term Reproducible Research (RR) refers to “the idea that the ultimate product of academic research is the paper along with the full computational environment used to produce the results in the paper such as the code, data, etc. that can be used to reproduce the results and create new work based on the research.” Source: Wikipedia.
The need for reproducibility is increasing dramatically as data analyses become more complex, involving larger datasets and more sophisticated computations. Obviously, the advent of the Cloud Computing paradigm is expected to provide the appropriate means for RR. This tutorial is meant to provide an overview of sensible tools every researcher (in computer science but not only) should be aware of to enable RR in its own work. In particular, and after a general talk presenting RR and the existing associated tools and workflow, this tutorial will propose several practical exercises and hands-on meant to be performed on each attendee’s laptop, to cover the management of sharable Development environment using Vagrant. Resources of this tutorial will be available on Github.
Topics
- Overview of Reproducible Research (RR) and Open Challenges
- Relevant Tools for RR: git, make & Co., knitr, continuous integration using Travis/Gitlab-CI.
- creation and configuration of lightweight, reproducible and portable environments using Virtual Machines
- installation, configuration and generation of Vagrant boxes
- Box Provisioning using puppet
- Vagrant providers
Level: beginner - advanced
See Also: your primary source of information if you are interested to know more about Reproducible Research is the excellent Series of Webinars on Reproducible Research organized by Arnaud Legrand and his colleagues from CNRS, Inria, University of Grenoble, ENS etc.:
Actually, part of the material proposed on this tutorial comes from this source, and I would like to thank again Arnaud for allowing me to do it.
Other resources you might be interested to check:
- Reproducible Builds
- Figshare: simplifying your research workflow
- Validation: the Science Exchange Network
Credits for the pictures: Valentin Plugaru