Exploiting scientific containers, cloud computing, and cloud data services, we present a framework for performing and communicating scalable, reproducible, and extensible science in the cloud. We show the capability to compute massive amounts of data parallelly in the cloud, and run a web service that enables intimate interaction and demonstration with the tools and data presented. We hope this model will inspire the community to produce reproducible and, importantly, extensible results which will enable us to collectively accelerate the rate at which scientific breakthroughs are discovered, replicated, and extended.
Running as a persistent Jupyter notebook, our demonstration walks through the ndmg pipeline and quality control.
Download either the frozen-for-publication or up-to-date versions of our code and try sic yourself!
Our pipeline is integrated with a variety of platforms, and has been used to process a variety of datasets in the cloud. We encourage you to pull on one of these threads.