Variant calling and bcbio training for the Harvard Chan Bioinformatics Core In Depth NGS Data Analysis Course (10 October 2018): slides
Building a diverse set of validations; lightning talk at the GCCBOSC2018 Bioinformatics Community Conference: slides
bcbio training at the GCCBOSC2018 Bioinformatics Community Conference, focusing on bcbio CWL integration with examples of variant calling analyses on Personal Genome Project examples (26 June 2018): slides; video
In depth description of bcbio and Common Workflow Language integration, including motivation and practical examples of running on clusters, DNAnexus, SevenBridges and Arvados. From the Boston Bioinformatics Interest Group meeting (2 November 2017): slides; video
Teaching variant calling, bcbio and GATK4 validation at the Summer 2017 NGS Data Analysis Course at Harvard Chan School (6 July 2017): slides
MIT Bioinformatics Interest Group about how Common Workflow Language enables interoperability with multiple workflow engines (3 November 2016): slides and video
Materials from teaching at the Summer 2016 NGS Data Analysis Course at Harvard Chan School (11 August 2016): slides
Materials from teaching from the Spring 2016 NGS Data Analysis Course at Harvard Chan School (28 April 2016): slides
Materials from teaching oriented example at Cold Spring Harbor Laboratory’s Advanced Sequencing Technology and Applications course. (18 November 2015): slides
Supporting the common workflow language and Docker in bcbio Bio in Docker symposium (9 November 2015): slides
Validation on human build 38, HLA typing, low frequency cancer calling and structural variation for Boston Bioinformatics Interest Group (BIG) meeting (5 November 2015): slides
Overview of variant calling for NGS Data Analysis Course at Harvard Medical School (19 May 2015): slides
NGS Glasgow (23 April 2015)
Boston Computational Biology and Bioinformatics meetup (1 April 2015): slides
Program in Genetic Epidemiology and Statistical Genetics seminar series at Harvard Chan School (6 February 2015): slides
Talk at Good Start Genetics (23 January 2015): slides
Intel Life Sciences discussion (7 August 2014): slides
bcbio hackathon at Biogen (3 June 2014)
Harvard ABCD group slides (17 April 2014)
BIG meeting (February 2014)
Novartis slides (21 January 2014)
Genome Informatics 2013
Feel free to reuse any images or text from these talks. The slides are on GitHub.
Community Development of Validated Variant Calling Pipelines
Brad Chapman, Rory Kirchner, Oliver Hofmann and Winston Hide Harvard School of Public Health, Bioinformatics Core, Boston, MA, 02115
Translational research relies on accurate identification of genomic variants. However, rapidly changing best practice approaches in alignment and variant calling, coupled with large data sizes, make it a challenge to create reliable and reproducible variant calls. Coordinated community development can help overcome these challenges by sharing testing and updates across multiple groups. We describe bcbio-nextgen, a distributed multi-architecture pipeline that automates variant calling, validation and organization of results for query and visualization. It creates an easily installable, reliable infrastructure from best-practice open source tools with the following goals:
Quantifiable: Validates variant calls against known reference materials developed by the Genome in a Bottle consortium. The bcbio.variation toolkit automates scoring and assessment of calls to identify regressions in variant identification as calling pipelines evolve. Incorporation of multiple variant calling approaches from Broad’s GATK best practices and the Marth lab’s gkno software enables informed comparisons between current and future algorithms.
Scalable: bcbio-nextgen handles large population studies with hundreds of whole genome samples by parallelizing on a wide variety of schedulers and multicore machines, setting up different ad hoc cluster configurations for each workflow step. Work in progress includes integration with virtual environments, including Amazon Web Services and OpenStack.
Accessible: Results automatically feed into tools for query and investigation of variants. The GEMINI framework provides a queryable database associating variants with a wide variety of genome annotations. The o8 web-based tool visualizes the work of variant prioritization and assessment.
Community developed: bcbio-nextgen is widely used in multiple sequencing centers and research laboratories. We actively encourage contributors to the code base and make it easy to get started with a fully automated installer and updater that prepares all third party software and reference genomes.