Presentations¶

Variant calling and bcbio training for the Harvard Chan Bioinformatics Core In Depth NGS Data Analysis Course (10 October 2018): slides
Building a diverse set of validations; lightning talk at the GCCBOSC2018 Bioinformatics Community Conference: slides
bcbio training at the GCCBOSC2018 Bioinformatics Community Conference, focusing on bcbio CWL integration with examples of variant calling analyses on Personal Genome Project examples (26 June 2018): slides; video
Description of bcbio and Common Workflow integration with a focus on parallelization strategies. From a bcbio discussion with Peter Park’s lab at Harvard Medical School (26 January 2018): slides
In depth description of bcbio and Common Workflow Language integration, including motivation and practical examples of running on clusters, DNAnexus, SevenBridges and Arvados. From the Boston Bioinformatics Interest Group meeting (2 November 2017): slides; video
bcbio practical interoperability with the Common Workflow Language at BOSC 2017 (22 July 2017): slides; video
Teaching variant calling, bcbio and GATK4 validation at the Summer 2017 NGS Data Analysis Course at Harvard Chan School (6 July 2017): slides
Training course for the Cancer Genomics Cloud, describing how bcbio uses the Common Workflow Language to run in multiple infrastructures (1 May 2017): slides
MIT Bioinformatics Interest Group about how Common Workflow Language enables interoperability with multiple workflow engines (3 November 2016): slides and video
Broad Institute software engineering seminar about bcbio validation and integration with Common Workflow Language and Workflow Definition Language (28 September 2016): slides
Materials from teaching at the Summer 2016 NGS Data Analysis Course at Harvard Chan School (11 August 2016): slides
Bioinformatics Open Source Conference (BOSC) 2016 lightning talk on bcbio and common workflow language (8 July 2016): slides and video.
Materials from teaching from the Spring 2016 NGS Data Analysis Course at Harvard Chan School (28 April 2016): slides
Statistical Genetics and Network Science Meeting at Channing Division of Network Medicine (23 March 2016): slides
Presentation at Curoverse Brown Bag Seminar on bcbio and in progress integration work with Common Workflow Language and Arvados (11 January 2016): slides
Materials from teaching oriented example at Cold Spring Harbor Laboratory’s Advanced Sequencing Technology and Applications course. (18 November 2015): slides
Supporting the common workflow language and Docker in bcbio Bio in Docker symposium (9 November 2015): slides
Validation on human build 38, HLA typing, low frequency cancer calling and structural variation for Boston Bioinformatics Interest Group (BIG) meeting (5 November 2015): slides
Presentation on Research Scientist Careers for Iowa State Bioinformatics Course (23 September 2015): slides
Prioritization of structural variants based on known biological information at BOSC 2015 (10 July 2015): slides; video
Overview of variant calling for NGS Data Analysis Course at Harvard Medical School (19 May 2015): slides
NGS Glasgow (23 April 2015)
Boston Computational Biology and Bioinformatics meetup (1 April 2015): slides
Program in Genetic Epidemiology and Statistical Genetics seminar series at Harvard Chan School (6 February 2015): slides
Talk at Good Start Genetics (23 January 2015): slides
Boston area Bioinformatics Interest Group (15 October 2014): slides
University of Georgia Institute of Bioinformatics (12 September 2014): slides
Intel Life Sciences discussion (7 August 2014): slides
Bioinformatics Open Source Conference (BOSC) 2014: slides, conference website
Galaxy Community Conference 2014: slides, conference website
bcbio hackathon at Biogen (3 June 2014)
Harvard ABCD group slides (17 April 2014)
BIG meeting (February 2014)
Novartis slides (21 January 2014)
Mt Sinai: Strategies for accelerating the genomic sequencing pipeline: Mt Sinai workshop slides, Mt Sinai workshop website
Genome Informatics 2013
Bioinformatics Open Source Conference 2013: BOSC 2013 Slides, BOSC 2013 Video, BOSC 2013 Conference website
Arvados Summit 2013: Arvados Summit Slides, Arvados Summit website
Scientific Python 2013: SciPy 2013 Video, SciPy 2013 Conference website

Feel free to reuse any images or text from these talks. The slides are on GitHub.

Abstract¶

Community Development of Validated Variant Calling Pipelines

Brad Chapman, Rory Kirchner, Oliver Hofmann and Winston Hide Harvard School of Public Health, Bioinformatics Core, Boston, MA, 02115

Translational research relies on accurate identification of genomic variants. However, rapidly changing best practice approaches in alignment and variant calling, coupled with large data sizes, make it a challenge to create reliable and reproducible variant calls. Coordinated community development can help overcome these challenges by sharing testing and updates across multiple groups. We describe bcbio-nextgen, a distributed multi-architecture pipeline that automates variant calling, validation and organization of results for query and visualization. It creates an easily installable, reliable infrastructure from best-practice open source tools with the following goals:

Quantifiable: Validates variant calls against known reference materials developed by the Genome in a Bottle consortium. The bcbio.variation toolkit automates scoring and assessment of calls to identify regressions in variant identification as calling pipelines evolve. Incorporation of multiple variant calling approaches from Broad’s GATK best practices and the Marth lab’s gkno software enables informed comparisons between current and future algorithms.
Scalable: bcbio-nextgen handles large population studies with hundreds of whole genome samples by parallelizing on a wide variety of schedulers and multicore machines, setting up different ad hoc cluster configurations for each workflow step. Work in progress includes integration with virtual environments, including Amazon Web Services and OpenStack.
Accessible: Results automatically feed into tools for query and investigation of variants. The GEMINI framework provides a queryable database associating variants with a wide variety of genome annotations. The o8 web-based tool visualizes the work of variant prioritization and assessment.
Community developed: bcbio-nextgen is widely used in multiple sequencing centers and research laboratories. We actively encourage contributors to the code base and make it easy to get started with a fully automated installer and updater that prepares all third party software and reference genomes.

Links from the presentation¶

HugeSeq
Genome Comparison & Analytic Testing at Bioplanet
Peter Block’s “Community” book
CloudBioLinux and Homebrew Science as installation frameworks; Conda as Python environment
bcbio documentation at Read the Docs
Arvados framework for meta data tracking, NGS processing and data provenance
Notes on improved scaling for NGS workflows
Genomic Reference Materials from Genome in a Bottle
Comparison of aligners and callers using NIST reference materials
Callers and minimal BAM preparation workflows
Coverage assessment