A Framework for Extensible Science

Exploiting scientific containers, cloud computing, and cloud data services, we present a framework for performing and communicating scalable, reproducible, and extensible science in the cloud. We show the capability to compute massive amounts of data parallelly in the cloud, and run a web service that enables intimate interaction and demonstration with the tools and data presented. We hope this model will inspire the community to produce reproducible and, importantly, extensible results which will enable us to collectively accelerate the rate at which scientific breakthroughs are discovered, replicated, and extended.

Read our paper @

GigaScience (coming soon) arXiv

Try our Demo!

Running as a persistent Jupyter server on Amazon's EC2, our demonstration walks through the ndmg pipeline and quality control. The cloud demo can be overloaded by multiple people running it, we therefore have also deployed a version on our local cluster.

Fork our Code

Download either the frozen-for-publication or up-to-date versions of our code and try sic yourself!

Use the Cloud

Our pipeline is integrated with a variety of platforms, and has been used to process a variety of datasets in the cloud. We encourage you to pull on one of these threads.


Kiar, G; Gorgolewski, K, J; Kleissas, D; Gray Roncal, W; Litt, B; Wandell, B; Poldrack, R A; Wiener, M; Vogelstein, R J; Burns, R; Vogelstein, J T
Corresponding Author: Joshua T. Vogelstein jovo@jhu.edu


Kiar, G; Gorgolewski, K, J; Kleissas, D; Gray Roncal, W; Litt, B; Wandell, B; Poldrack, R, A; Wiener, M; Vogelstein, R, J; Burns, R; Vogelstein, J, T (2017): Example use case of SIC with the ndmg pipeline (SIC:ndmg) GigaScience Database. http://dx.doi.org/10.5524/100285