- In depth description of bcbio and Common Workflow Language integration, including motivation and practical examples of running on clusters, DNAnexus, SevenBridges and Arvados. From the Boston Bioinformatics Interest Group meeting (2 November 2017): slides; video
- bcbio practical interoperability with the Common Workflow Language at BOSC 2017 (22 July 2017): slides; video
- Teaching variant calling, bcbio and GATK4 validation at the Summer 2017 NGS Data Analysis Course at Harvard Chan School (6 July 2017): slides
- Training course for the Cancer Genomics Cloud, decribing how bcbio uses the Common Workflow Language to run in multiple infrastructures (1 May 2017): slides
- MIT Bioinformatics Interest Group about how Common Workflow Language enables interoperability with multiple workflow engines (3 November 2016): slides and video
- Broad Institute software engineering seminar about bcbio validation and integration with Common Workflow Language and Workflow Definition Language (28 September 2016): slides
- Materials from Teaching at the Summer 2016 NGS Data Analysis Course at Harvard Chan School (11 August 2016): slides
- Bioinformatics Open Source Conference (BOSC) 2016 lightning talk on bcbio and common workflow language (8 July 2016): slides and video.
- Materials from Teaching from the Spring 2016 NGS Data Analysis Course at Harvard Chan School (28 April 2016): slides
- Statistical Genetics and Network Science Meeting at Channing Division of Network Medicine (23 March 2016): slides
- Presentation at Curoverse Brown Bag Seminar on bcbio and in progress integration work with Common Workflow Language and Arvados (11 January 2016): slides
- Materials from Teaching oriented example at Cold Spring Harbor Laboratory’s Advanced Sequencing Technology and Applications course. (18 November 2015): slides
- Supporting the common workflow language and Docker in bcbio Bio in Docker symposium (9 November 2015): slides
- Validation on human build 38, HLA typing, low frequency cancer calling and structural variation for Boston Bioinformatics Interest Group (BIG) meeting (5 November 2015): slides
- Presentation on Research Scientist Careers for Iowa State Bioinformatics Course (23 September 2015): slides
- Prioritization of structural variants based on known biological information at BOSC 2015 (10 July 2015): slides; video
- Overview of variant calling for NGS Data Analysis Course at Harvard Medical School (19 May 2015): slides
- NGS Glasgow (23 April 2015): slides
- Boston Computational Biology and Bioinformatics meetup (1 April 2015): slides
- Program in Genetic Epidemiology and Statistical Genetics seminar series at Harvard Chan School (6 February 2015): slides
- Talk at Good Start Genetics (23 January 2015): slides
- Boston area Bioinformatics Interest Group (15 October 2014): slides
- University of Georgia Institute of Bioinformatics (12 September 2014): slides
- Intel Life Sciences discussion (7 August 2014): slides
- Bioinformatics Open Source Conference (BOSC) 2014: slides, conference website
- Galaxy Community Conference 2014: slides, conference website
- bcbio hackathon at Biogen (3 June 2014)
- Harvard ABCD group slides (17 April 2014)
- BIG meeting (February 2014)
- Novartis slides (21 January 2014)
- Mt Sinai: Strategies for accelerating the genomic sequencing pipeline: Mt Sinai workshop slides, Mt Sinai workshop website
- Genome Informatics 2013 GI 2013 Presentation slides
- Bioinformatics Open Source Conference 2013: BOSC 2013 Slides, BOSC 2013 Video, BOSC 2013 Conference website
- Arvados Summit 2013: Arvados Summit Slides, Arvados Summit website
- Scientific Python 2013: SciPy 2013 Video, SciPy 2013 Conference website
Feel free to reuse any images or text from these talks. The slides are on GitHub.
Community Development of Validated Variant Calling Pipelines
Brad Chapman, Rory Kirchner, Oliver Hofmann and Winston Hide Harvard School of Public Health, Bioinformatics Core, Boston, MA, 02115
Translational research relies on accurate identification of genomic variants. However, rapidly changing best practice approaches in alignment and variant calling, coupled with large data sizes, make it a challenge to create reliable and reproducible variant calls. Coordinated community development can help overcome these challenges by sharing testing and updates across multiple groups. We describe bcbio-nextgen, a distributed multi-architecture pipeline that automates variant calling, validation and organization of results for query and visualization. It creates an easily installable, reliable infrastructure from best-practice open source tools with the following goals:
- Quantifiable: Validates variant calls against known reference materials developed by the Genome in a Bottle consortium. The bcbio.variation toolkit automates scoring and assessment of calls to identify regressions in variant identification as calling pipelines evolve. Incorporation of multiple variant calling approaches from Broad’s GATK best practices and the Marth lab’s gkno software enables informed comparisons between current and future algorithms.
- Scalable: bcbio-nextgen handles large population studies with hundreds of whole genome samples by parallelizing on a wide variety of schedulers and multicore machines, setting up different ad hoc cluster configurations for each workflow step. Work in progress includes integration with virtual environments, including Amazon Web Services and OpenStack.
- Accessible: Results automatically feed into tools for query and investigation of variants. The GEMINI framework provides a queryable database associating variants with a wide variety of genome annotations. The o8 web-based tool visualizes the work of variant prioritization and assessment.
- Community developed: bcbio-nextgen is widely used in multiple sequencing centers and research laboratories. We actively encourage contributors to the code base and make it easy to get started with a fully automated installer and updater that prepares all third party software and reference genomes.