A python toolkit providing best-practice pipelines for fully automated high throughput sequencing analysis. You write a high level configuration file specifying your inputs and analysis parameters. This input drives a parallel pipeline that handles distributed execution, idempotent processing restarts and safe transactional steps. The goal is to provide a shared community resource that handles the data processing component of sequencing analysis, providing researchers with more time to focus on the downstream biology.
Contents¶
User stories
- Somatic (cancer) variants
- Bulk RNA-seq
- Counting cells and transcripts for inDrops3 data
- PureCN analysis of tumor-only samples
- HLA typing
- Small germline variants
- 3’ DGE
- Structural variant calling
- ATAC-seq
- Methylation
- Variant calling using bulk RNA-seq data
- Detecting gene fusions with bulk RNA-seq data
- fast RNA-seq
- Disambiguation
- smallRNA-seq
Infrastructure
- Installation
- Configuration
- Parallel execution
- Outputs
- Project directory:
- Sample directories:
- Why do I have so many coverage metrics? Which one should I use?
- Interpretation of ontarget_pct vs usable_pct
- Interpretation of mosdepth median coverage vs qualimap median coverage
- Interpretation of bcbio(mosdepth) average target coverage vs qualimap mean coverage
- Why I am getting Ontarget_pct > 100?
- Downstream analysis
- Common Workflow Language (CWL)
- Cloud
- Development
Misc
- Users
- Internals
- Presentations
- Teaching
- Single cell RNA-seq analysis
- Cancer tumor-normal variant calling
- Citations
- Variant calling
- Read alignment
- Interval arithmetics.
- Quality control
- Coverage and callable regions
- SNP and indels in germline (WES, WGS, gene panels)
- Structural and copy number variants in germline (WGS data)
- Somatic small variants
- Somatic copy number variants
- Variant annotation
- bulk RNA-seq
- Fusion calling - RNA-seq
- ATAC-seq
- small RNA-seq