Structural variant calling¶
bcbio can detect larger (>50bp) structural variants like deletions, insertions, inversions and copy number changes for both germline population and cancer variant calling
To enable structural variant calling, specify
in the algorithm section of your configuration
- description: Sample algorithm: svcaller: [lumpy, manta, cnvkit]
Split read callers (primary use case - germline WGS sequencing):
Read-depth based CNV callers (primary use case - T/N cancer CNV calling)
This example runs structural variant calling with multiple callers (Lumpy, Manta and CNVkit), providing a combined output summary file and validation metrics against NA12878 deletions. It uses the same NA12878 input as the whole genome trio example.
To run the analysis do:
mkdir -p NA12878-sv-eval cd NA12878-sv-eval wget https://raw.githubusercontent.com/bcbio/bcbio-nextgen/master/config/examples/NA12878-sv-getdata.sh bash NA12878-sv-getdata.sh cd work bcbio_nextgen.py ../config/NA12878-sv.yaml -n 16
This is large whole genome analysis and the timing and disk space requirements for the NA12878 trio analysis above apply here as well.
svcaller– List of structural variant callers to use. [lumpy, manta, cnvkit, gatk-cnv, seq2c, purecn, titancna, delly, battenberg]. LUMPY and Manta require paired end reads. cnvkit and gatk-cnv should not be used on the same sample due to incompatible normalization approaches, please pick one or the other for CNV calling.
svprioritize– Produce a tab separated summary file of structural variants in regions of interest. This complements the full VCF files of structural variant calls to highlight changes in known genes. See the paper on cancer genome prioritization for the full details. This can be either the path to a BED file (with
chrom start end gene_name, see Input file preparation) or the name of one of the pre-installed prioritization files:
cancer/civic(hg19, GRCh37, hg38) – Known cancer associated genes from CIViC.
cancer/az300(hg19, GRCh37, hg38) – 300 cancer associated genes contributed by AstraZeneca oncology.
cancer/az-cancer-panel(hg19, GRCh37, hg38) – A text file of genes in the AstraZeneca cancer panel. This is only usable for
svprioritizewhich can take a list of gene names instead of a BED file.
actionable/ACMG56– Medically actionable genes from the The American College of Medical Genetics and Genomics
coding/ccds(hg38) – Consensus CDS (CCDS) regions with 2bps added to internal introns to capture canonical splice acceptor/donor sites, and multiple transcripts from a single gene merged into a single all inclusive gene entry.
fusion_modeEnable fusion detection in RNA-seq when using STAR (recommended) or Tophat (not recommended) as the aligner. OncoFuse is used to summarise the fusions but currently only supports
GRCh37. For explant samples
disambiguateenables disambiguation of
STARoutput [false, true]. This option is deprecated in favor of
fusion_callerSpecify a standalone fusion caller for fusion mode. Supports
oncofusefor STAR/tophat runs,
ericscriptfor all runs. If a standalone caller is specified (i.e.
ericscript), fusion detection will not be performed with aligner.
oncofuseonly supports human genome builds GRCh37 and hg19.
ericscriptsupports human genome builds GRCh37, hg19 and hg38 after installing the associated fusion databases (Customizing data installation).
known_fusionsA TAB-delimited file of the format
gene2are identifiers of genes specified under
gene_namein the attributes part of the GTF file.
Validation of germline structural variant detection using multiple calling methods to validate against deletions in NA12878. This implements a pipeline that works in tandem with SNP and indel calling to detect larger structural variations like deletions, duplications, inversions and copy number variants (CNVs).
Validation of tumor/normal calling using the synthetic DREAM validation set. This includes validation of additional callers against duplications, insertions and inversions.
See references to invidivual tools on the citations page.