smallRNA-seq

Overview

bcbio supports configurable best-practices pipeline for smallRNA-seq quality controls, adapter trimming, miRNA/isomiR quantification and other small RNA detection.

bcbio yaml config example

upload:
  dir: ../final
details:
  - analysis: smallRNA-seq
    algorithm:
      aligner: star # any other aligner is supported.
      # change adapter according project
      adapters: ["TGGAATTCTCGGGTGC"] 
      expression_caller: [trna, seqcluster, mirdeep2]
      # expression_caller: [trna, seqcluster, mirdeep2, mirge] Read docs to know how to use
      # miRge tools: https://bcbio-nextgen.readthedocs.io/en/latest/contents/pipelines.html#smallrna-seq
      species: hsa
    genome_build: hg19
#resources:
#  atropos: 
#    options: ["-u 4", "-u -4"]
#  mirge: 
#    options: ["-lib $PATH_TO_LIBS_FOLDER"]  
  • Adapter trimming:

  • Sequence alignment:

    • STAR for genome annotation

    • bowtie, bowtie2 and hisat2 for genome annotation as an option

  • Specific small RNAs quantification (miRNA/tRNAs…):

    • seqbuster for miRNA annotation

    • MINTmap for tRNA fragments annotation

    • miRge2 for alternative small RNA quantification. To setup this tool, you need to install manually miRge2.0, and download the library data for your species. Read how to install and download the data. If you have human folder at /mnt/data/human the option to pass to resources will be /mnt/data. Then setup resources:

      resources:
          mirge:
              options: ["-lib $PATH_TO_PARENT_SPECIES_LIB"]
      
  • Quality control: FastQC

  • Other small RNAs quantification:

The pipeline generates a RMD template file inside report folder that can be rendered with knitr. An example of the report is here. Count table (counts_mirna.tst) from mirbase miRNAs will be inside mirbase or final project folder. Input files for isomiRs package for isomiRs analysis will be inside each sample in mirbase folder. If mirdeep2 can run, count table (counts_mirna_novel.tsv) for novel miRNAs will be inside mirdeep2 or final project folder. tdrmapper results will be inside each sample inside tdrmapper or final project folder.

Parameters

  • adapters The 3’ end adapter that needs to be remove. For NextFlex protocol you can add adapters: ["4N", "$3PRIME_ADAPTER"]. For any other options you can use resources: atropos:options:["-u 4", "-u -4"].

  • species 3 letters code to indicate the species in mirbase classification (i.e. hsa for human).

  • aligner Currently STAR is the only one tested although bowtie can be used as well.

  • expression_caller A list of expression callers to turn on: trna, seqcluster, mirdeep2, mirge

  • transcriptome_gtf An optional GTF file of the transcriptome to for seqcluster.

  • spikein_fasta A FASTA file of spike in sequences to quantitate.

  • umi_type: 'qiagen_smallRNA_umi' Support of Qiagen UMI small RNAseq protocol.

Output

Project directory:

  • counts_mirna.tsv – miRBase miRNA count matrix.

  • counts.tsv – miRBase isomiRs count matrix. The ID is made of 5 tags: miRNA name, SNPs, additions, trimming at 5 and trimming at 3. Here there is detail explanation of the naming.

  • counts_mirna_novel.tsv – miRDeep2 miRNA count matrix.

  • counts_novel.tsv – miRDeep2 isomiRs. See counts.tsv explanation for more detail. count matrix.

  • seqcluster – output of seqcluster tool. Inside this folder, counts.tsv has count matrix for all clusters found over the genome.

  • seqclusterViz – input file for interactive browser at https://github.com/lpantano/seqclusterViz

  • report – Rmd template to help with downstream analysis like QC metrics, differential expression, and clustering.

References

Sample directories:

  • SAMPLE-mirbase-ready.counts – counts for miRBase miRNAs.

  • SAMPLE-novel-ready – counts for miRDeep2 novel miRNAs.

  • tRNA – output for tdrmapper.