bcbio supports configurable best-practices pipeline for smallRNA-seq quality controls, adapter trimming, miRNA/isomiR quantification and other small RNA detection.
Specific small RNAs quantification (miRNA/tRNAs…):
seqbuster for miRNA annotation
MINTmap for tRNA fragments annotation
miRge2 for alternative small RNA quantification. To setup this tool, you need to install manually miRge2.0, and download the library data for your species. Read how to install and download the data. If you have
/mnt/data/humanthe option to pass to resources will be
/mnt/data. Then setup
resources: mirge: options: ["-lib $PATH_TO_PARENT_SPECIES_LIB"]
Quality control: FastQC
Other small RNAs quantification:
mirDeep2 for miRNA prediction
The pipeline generates a RMD template file inside
report folder that can be rendered with knitr. An example of the report is here. Count table (
counts_mirna.tst) from mirbase miRNAs will be inside
mirbase or final project folder. Input files for isomiRs package for isomiRs analysis will be inside each sample in
mirbase folder. If mirdeep2 can run, count table (
counts_mirna_novel.tsv) for novel miRNAs will be inside
mirdeep2 or final project folder. tdrmapper results will be inside each sample inside
tdrmapper or final project folder.
adaptersThe 3’ end adapter that needs to be remove. For NextFlex protocol you can add
adapters: ["4N", "$3PRIME_ADAPTER"]. For any other options you can use resources:
atropos:options:["-u 4", "-u -4"].
species3 letters code to indicate the species in mirbase classification (i.e. hsa for human).
alignerCurrently STAR is the only one tested although bowtie can be used as well.
expression_callerA list of expression callers to turn on: trna, seqcluster, mirdeep2, mirge
transcriptome_gtfAn optional GTF file of the transcriptome to for seqcluster.
spikein_fastaA FASTA file of spike in sequences to quantitate.
umi_type: 'qiagen_smallRNA_umi'Support of Qiagen UMI small RNAseq protocol.
counts_mirna.tsv– miRBase miRNA count matrix.
counts.tsv– miRBase isomiRs count matrix. The ID is made of 5 tags: miRNA name, SNPs, additions, trimming at 5 and trimming at 3. Here there is detail explanation of the naming.
counts_mirna_novel.tsv– miRDeep2 miRNA count matrix.
counts_novel.tsv– miRDeep2 isomiRs. See counts.tsv explanation for more detail. count matrix.
seqcluster– output of seqcluster tool. Inside this folder, counts.tsv has count matrix for all clusters found over the genome.
seqclusterViz– input file for interactive browser at https://github.com/lpantano/seqclusterViz
report– Rmd template to help with downstream analysis like QC metrics, differential expression, and clustering.
SAMPLE-mirbase-ready.counts– counts for miRBase miRNAs.
SAMPLE-novel-ready– counts for miRDeep2 novel miRNAs.
tRNA– output for tdrmapper.