miRDeep2的文件夹下面有自带的tutorial,参考经过参考这个例子学习miRDeep2.html
tutorial_dir文件夹里有下面几个文件,.fa为fasta格式。linux
cel_cluster.fa: # 研究物种的基因组文件 express
mature_ref_this_species.fa: # 研究物种的成熟miRNA文件,miRBase有下载浏览器
mature_ref_other_species.fa: # 其余物种相关的成熟miRNA文件,miRBase有下载bash
precursors_ref_this_species.fa: # 研究物种miRNA前体的文件,miRBase有下载app
reads.fa: # deep sequencing readside
~~~~~~~~~~第一步~~~~~~~~~学习
# 利用bowtie-build创建基因组文件的index测试
bowtie-build cel_cluster.fa cel_cluster # cel_cluster.fa是基因组文件,cel_cluster是index文件的ui
前缀,这个前缀能够是任意的
# 字符,不必定要和基因组文件相同。
~~~~~~~~~~第二步~~~~~~~~~
# 处理reads文件而且把它map到基因上
perl mapper.pl reads.fa -c -j -k TCGTATGCCGTCTTCTGCTTGT -l 18 -m -p cel_cluster -s
reads_collapsed.fa -t reads_collapsed_vs_genome.arf -v
参数讲解
-c 指出输入文件是fasta格式,同类的参数还有-a(seq.txt format),-b(qseq.txt format),-e(fastq format),-d
(contig file)
-j 删除不规范的字母(不规范的字母是指除a,c,g,t,u,n,A,C,G,T,U,N以外的字母)
-k 剪切接头,后跟接头序列,例子中的TCGTATGCCGTCTTCTGCTTGT就是接头
-l 忽视小于某长度的序列,例子中忽视18nt长度的reads
-m collapses the reads
-p 将处理过的reads map到以前创建过索引的基因组上,例子中的cel_cluster
-s 指出将处理过的reads输出到某个文件,例子中将处理过的reads输出到reads_collapsed.fa
-t 指出将mapping的结果输出到某个文件,例子中将mapping后的结果输出到reads_collapsed_vs_genome.arf文件中
-v 在屏幕上显示处理的动做,加v和不加v的区别见附注1,明显看出来加v后屏幕不只显示了一个处理后的summary,而
且显示了mapper的动做,如discarding,clipping,collapsing,trimming。不加v屏幕上只显示一个summary
例子中未使用的参数
处理/mapping参数
-g 给reads一个前缀,默认是seq。-s和-t两个输出文件中reads前面会多出seq三个字母。
-h parse to fasta format
-i 转换rna成dna(再map到基因组)convert rna to dna alphabet (to map against genome)
-q 种子序列中一个错配(mapping的时间会变长??)map with one mismatch in the seed (mapping takes
longer)
-r 容许在基因组上map到的最多的位置数,默认是5。也就是说最多map 5个位置
-u 不移除临时文件的路径
-n 覆盖已有文件
~~~~~~~~~~第三步~~~~~~~~~
# fast quantitation of reads mapping to known miRBase precursors.
(This step is not required for
identification of known and novel miRNAs in the deep sequencing data when using miRDeep2.pl.)
快速定量reads mapping到已知的miRNA前体。利用miRDeep.pl在deep sequencing数据中鉴定已知和未知的miRNA,这
一步不是必须的。
quantifier.pl -p precursors_ref_this_species.fa -m mature_ref_this_species.fa -r reads_collapsed.fa
-t cel -y 16_19
参数讲解
-p miRNA前体文件,miRBase能够下载
-m 成熟miRNA序列文件,miRBase能够下载
-r reads文件
-t 物种,能够指定某个物种,这样分析的时候只考虑某个物种的数据。也能够不指定,分析全部的
-y [time] optional otherwise its generating a new one
屏幕上显示的结果
getting samples and corresponding read numbers
seq 374333 reads
Converting input files
building bowtie index
mapping mature sequences against index
# reads processed: 174
# reads with at least one reported alignment: 6 (3.45%)
# reads that failed to align: 168 (96.55%)
Reported 6 alignments to 1 output stream(s)
mapping read sequences against index
# reads processed: 1505
# reads with at least one reported alignment: 1088 (72.29%)
# reads that failed to align: 417 (27.71%)
Reported 1099 alignments to 1 output stream(s)
analyzing data
6 mature mappings to precursors
Expressed miRNAs are written to expression_analyses/expression_analyses_16_19/miRNA_expressed.csv
not expressed miRNAs are written to
expression_analyses/expression_analyses_16_19/miRNA_not_expressed.csv
Creating miRBase.mrd file
after READS READ IN thing
make_html2.pl -q expression_analyses/expression_analyses_16_19/miRBase.mrd -k
mature_ref_this_species.fa -z -t C.elegans -y 16_19 -o -i
expression_analyses/expression_analyses_16_19/mature_ref_this_species_mapped.arf -l -m cel
miRNAs_expressed_all_samples_16_19.csv
miRNAs_expressed_all_samples_16_19.csv file with miRNA expression values
parsing miRBase.mrd file finished
creating PDF files
creating pdf for cel-mir-39 finished
creating pdf for cel-mir-40 finished
creating pdf for cel-mir-37 finished
creating pdf for cel-mir-36 finished
creating pdf for cel-mir-38 finished
creating pdf for cel-mir-41 finished
#
获得几个文件,expression_16_19.html,expression_analyses文件夹(里面有不少文件),
iRNAs_expressed_all_samples_16_19.csv
,pdfs_16_19文件夹
~~~~~~~~~~第四步~~~~~~~~~
#在deep sequencing data中鉴定已知和未知的miRNA
miRDeep2.pl reads_collapsed.fa cel_cluster.fa reads_collapsed_vs_genome.arf
mature_ref_this_species.fa mature_ref_other_species.fa precursors_ref_this_species.fa -t C.elegans
2> report.log
# reads_collapsed.fa是通过mapper.pl处理的reads。
# cel_cluster.fa是基因组文件
# reads_collapsed_vs_genome.arf mapping的结果
# mature_ref_this_species.fa研究物种的成熟miRNA文件,miRBase有下载
# mature_ref_other_species.fa其余物种相关的成熟miRNA文件,miRBase有下载
# precursors_ref_this_species.fa研究物种miRNA前体的文件,miRBase有下载
# 若是你只有reads,arf文件,genome文件,其余文件没有,须要这样表示miRNAs_ref/none miRNAs_other/none
precursors/none,本物种的成熟miRNA无,其余相关物种也无,更没有前体。
参数说明
-t 物种
2> repot.log表示将全部的步骤输出到report.log文件中
# 屏幕显示
#####################################
# #
# miRDeep2 #
# #
# last change: 07/07/2011 #
# #
#####################################
miRDeep2 started at 19:44:43
#Starting miRDeep2
#testing input files
#Quantitation of known miRNAs in data
#parsing genome mappings
#excising precursors
#preparing signature
#folding precursors
#computing randfold p-values
#running miRDeep core algorithm
#running permuted controls
#doing survey of accuracy
#producing graphic results
miRDeep runtime:
started: 19:44:43
ended: 19:46:15
total:0h:1m:32s
~~~~~~~~~~第五步~~~~~~~~~
# 浏览结果
用浏览器打开.html文件
注意,cel-miR-37预测了两次。由于这个位点的两个潜在的前体能够折叠成发卡结构。然而,注释的发卡结构得分远远
高于未注释的发卡结构(miRDeep2 score 6.1e+4 vs. -0.2)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~附注1~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
######加v###屏幕上输出的结果以下####
discarding sequences with non-canonical letters
clipping 3' adapters
discarding short reads
collapsing reads
mapping reads to genome index
# reads processed: 1609
# reads with at least one reported alignment: 470 (29.21%)
# reads that failed to align: 1139 (70.79%)
Reported 480 alignments to 1 output stream(s)
trimming unmapped nts in the 3' ends
######不加v###屏幕上输出的结果以下####
# reads processed: 1609
# reads with at least one reported alignment: 470 (29.21%)
# reads that failed to align: 1139 (70.79%)
Reported 480 alignments to 1 output stream(s)
~~~~~~~~~~~~~~附注1~~~~~~~~~~~~~~~~~~
原文地址:http://blog.sina.com.cn/s/blog_7cffd1400101m3i3.html http://blog.sina.com.cn/s/blog_7cffd1400100twvb.html