Introduction

bcbio-nextgen provides best-practice pipelines for automated analysis of high throughput sequencing data with the goal of being:

  • Quantifiable: Doing good science requires being able to accurately assess the quality of results and re-verify approaches as new algorithms and software become available.
  • Analyzable: Results feed into tools to make it easy to query and visualize the results.
  • Scalable: Handle large datasets and sample populations on distributed heterogeneous compute environments.
  • Reproducible: Track configuration, versions, provenance and command lines to enable debugging, extension and reproducibility of results.
  • Community developed: The development process is fully open and sustained by contributors from multiple institutions. By working together on a shared framework, we can overcome the challenges associated with maintaining complex pipelines in a rapidly changing area of research.
  • Accessible: Bioinformaticians, biologists and the general public should be able to run these tools on inputs ranging from research materials to clinical samples to personal genomes.

Users

A sample of institutions using bcbio-nextgen for solving biological problems. Please submit your story if you’re using the pipeline in your own research.

  • Harvard School of Public Health: We use bcbio-nextgen within the bioinformatics core for variant calling on large population studies related to human health like Breast Cancer and Alzheimer’s disease. Increasing scalability of the pipeline has been essential for handling study sizes of more than 1400 whole genomes.
  • Massachusetts General Hospital: The Department of Molecular Biology uses the pipeline to automatically process samples coming off Illumina HiSeq instruments. Automated pipelines perform alignment and sample-specific analysis, with results directly uploaded into a local Galaxy instance.
  • Science for Life Laboratory: The genomics core platform in the Swedish National Infrastructure (NGI) for genomics, has crunched over 16TBp (terabasepairs) and processed almost 7000+ samples from the beginning of 2013 until the end of July. UPPMAX, our cluster located in Uppsala runs the pipeline in production since 2010.
  • Institute of Human Genetics, UCSF: The Genomics Core Facility utilizes bcbio-nextgen in processing more than 2000 whole genome, exome, RNA-seq, ChIP-seq on various projects. This pipeline tremendously lowers the barrier of getting access to next generation sequencing technology. The community engaged here is also very helpful in providing best practices advices and up-to-date solution to ease scientific discovery.
  • The Translational Genomics Research Institute (TGen): Members of the Huentelman lab at TGen apply bcbio-nextgen to a wide variety of studies of with a major focus in the neurobiology of aging and neurodegeneration in collaboration with the The Arizona Alzheimer’s Consortium (AAC) and the McKnight Brain Research Foundation. We also use bcbio in studies of rare diseases in children through TGen’s Center for Rare Childhood Disorders (C4RCD), and other rare diseases such as Multiple System Atrophy (MSA). bcbio-nextgen has also been instrumental in projects for TGen’s Program for Canine Health & Performance (PCHP) and numerous RNA-seq projects using rodent models. Our work with bcbio started with a parnership with Dell and The Neuroblastoma and Medulloblastoma Translational Research Consortium (NMTRC), and TGen as part of a Phase I clinical trial in these rare childhood cancers.