miRNA结合位点预测软件RNAhybrid的使用教程

RNAhybrid的介绍

RNAhybrid是Behmsmeier M等基于miRNA和靶基因二聚体二级结构开发的miRNA靶基因预测软件。RNAhybrid预测算法禁止分子内、miRNA分子间及靶基因间造成二聚体,根据miRNA和靶基因间结合能探测最佳的靶位点。尽管随着靶基因序列长度增长,运算复杂度也相应增长,但RNAhybrid和其它RNA二级结构预测软件诸如mfold, RNAfold, RNAcofold和pairfold相比,仍具备明显的速度优点。此外,RNAhybrid容许用户自定义自由能阈值及p值,也容许用户设置杂交位点的偏向,如杂交位点必须包含miRNA 5’端2-7nt等。html

1.RNAhybrid的下载与安装

1 wget https://bibiserv.cebitec.uni-bielefeld.de/applications/rnahybrid/resources/downloads/RNAhybrid-2.1.2.tar.gz
2 tar -xzvf RNAhybrid-2.1.2.tar.gz 3 cd /path/to/ RNAhybrid-2.1.2
4 ./configure 5 sudo make #这里尽可能使用管理员模式,否则容易出错
6 sudo make install 

验证是否安装成功,能够输入which RNAhybrid,如显示地址,则安装成功,如下是用win10下的WSL下的ubuntu作的示范:

 

2.输入文件的准备

1.target sequence(s)

This contains one or more sequences that are used by RNAhybrid to hybridize the miRNA(s) on. RNAhybrid uses all this sequences to find minimal free energy hybridisations between miRNA(s) and target sequence(s). Sequences should be in RNA.fasta format but RNAhybrid can also use DNA.fasta files. A single Sequences one can use can contain up to 50000 basepairs.算法

这里的target sequence用的是从circbase下载的人的circRNA的fasta文件,具体下载方法参考我这篇博客https://www.cnblogs.com/yanjiamin/p/12057362.htmlubuntu

2.miRNA sequence(s)

contains one or more micro RNA(s) that RNAhybrid uses to hybridize with the RNA sequences and to find the minimal free energy hybridization. A single micro RNA sequence can contain up to 2000 basepairs.网络

这里的miRNA sequence用的是从miRbase下载的成熟的人的miRNA的fasta文件,具体下载方法参考我这篇博客https://www.cnblogs.com/yanjiamin/p/12057362.htmlapp

 

3.RNAhybrid的使用

Usage: RNAhybrid [options] [target sequence] [query sequence].less

options:ide

-b <number of hits per target>  #意思是一个miRNA和一个target sequence的某一段序列匹配状况最多列出几回,好比一个miRNA和一个target sequence的某一段序列匹配存在多种状况,则-b 1就是列出最优的匹配状况,通常选1就比较好。这个最终获得的数目也与<energy cut-off>的设定值有关。
-c compact output  #使用这个参数,每个匹配只会显示一行输出。若是只想知道结果是否与RNAhybrid校准的结果相同,建议使用这个参数。
-d <xi>,<theta>  #位置和形状参数
-f helix constraint  #
-h help
-m <max targetlength>
-n <max query length>
-u <max internal loop size (per side)>  #内部成环的错配碱基的个数,使用-u 0,将获得彻底没有错配碱基内部成环的结构。
-v <max bulge loop size>  #internal loop是两条链都没有结合位点的内部环,而bulge loop是某一条上多出的碱基的突出
-e <energy cut-off>  #两条序列匹配的最低自由能,先设置 -e -30看看效果。
-p <p-value cut-off>  
-s (3utr_fly|3utr_worm|3utr_human)  #用于极值分布参数的快速估计,你能够选择nothing,3utr_fly, 3utr_worm和3utr_human来更好的匹配这些物种。你不能同时使用helix constrain和approximate p-value这两个参数。
-g (ps|png|jpg|all)  #图片输出的格式,有ps,png,jpg或者all四个选项
-t <target file>  #fasta格式的target gene文件
-q <query file>  #fasta格式的miRNA文件oop

Either a target file has to be given (FASTA format)
or one target sequence directly.ui

Either a query file has to be given (FASTA format)
or one query sequence directly.this

The helix constraint format is "from,to", eg. -f 2,7 forces
structures to have a helix from position 2 to 7 with respect to the query.

<xi> and <theta> are the position and shape parameters, respectively,
of the extreme value distribution assumed for p-value calculation.
If omitted, they are estimated from the maximal duplex energy of the query.
In that case, a data set name has to be given with the -s flag.


PS graphical output not supported.


PNG and JPG graphical output not supported.

 

Name Description
helix constraint from

Forces all structures to have a helix from position a to position b in respect to the query. The first base has position 1. The parameter "Helix constrain from" has to be lower or equal to the parameter "Helix constraint to". You can not use Helix constraint and approximate p-values at the same time.

hits per target

This Parameter defines how many hits are shown by RNAhybrid. The hits are shown by increasing minimal free energy ( the lower the energy the better the result)

Compact output

When this parameter is used RNAhybrid gives you only one line of output

instead of the whole output it normally generates.

Generate graphics Generates a graphical representation of the output in jpg, png and ps format, if less than 6 hits choosen. If RNAhybrid breaks with an unexpected error, it is often a good idea not to enable the graphical representation generation.
Max internal loop length

The maximal number of unpaired nucleotides in either side of an internal loop.

energy Threshold

Shows the hits with all minimal free energy's lower then the threshold (the lower the result the better). The value has to be lower or equal to zero.

Notice that the output only shows the results that exceed the energy threshold or the maximal hits per target.

Max bulge loop length

the maximal number of unpaired nucleotides in a bulge loop.

No G:U in seed If you click on this you choose weather their are no G:U bindings allowed in the seed or not. This parameter can only be chosen if you also use the parameters "Helix constraint from" and "Helix constraint to".
helix constraint to

see helix constraint this is position b you have to use both parameters to use Helix constraints.

approximate p-value

Used for a quick estimate of extreme value distribution Parameters. You can choose between nothing, 3utr_fly, 3utr_worm and 3utr_human for better equitation within these species. You can not use Helix constraint and approximate p-values at the same time.

 

4.RNAhybrid进行人miRNA的靶位点预测的条件

1.miRNA的第8到12个碱基和circRNA的必须是彻底配对的,这里须要设置的参数是-f helix constraint,也就是设置-f 8,12

2.是指上下两条链都错配造成的错配环,这种错配环中任何一条链的错配碱基不能超过1个,这里须要设置的参数是-u <max internal loop size (per side)> ,也就是设置-u 1

3.突出环即一条链多出了一个碱基的突出,这种突出环最多突出一个碱基,这里须要设置的参数是-v <max bulge loop size> ,也就是设置-v 1

4.容许G:U配对,默认的参数是容许G:U配对,你也能够设置no G:U in seeds来设置不容许G:U配对

5.末端未配对的突出不能超过两个碱基

6.不容许存在连续3个碱基的错配

7.总数不超过4个碱基的错配

1 RNAhybrid -g jpg -b 1 -e -20 -f 8,12 -u 1 -v 1 -s 3utr_human -t SFTSV_24vscontrol_DEcircBase.fa -q hsa_miRNA.fa>SFTSV_24bscontrol_circRNA_miRNA_RNAhybrid #输出会直接打印在终端里,因此建议你在终端以 “>" 输出保存为一个文件

 

 

RNAhybrid产生的结果中,设置了-g jpg可是没有产出jpg文件,不知道为何

这里产生的结果需整理成circRNA miRNA格式的包含行名为circRNA和miRNA的数据框,而后用cytoscape作ceRNA网络图。

相关文章
相关标签/搜索