DNA sequence(映射+BFS)

Problem Description

The twenty-first century is a biology-technology developing century. We know that a gene is made of DNA. The nucleotide bases from which DNA is built are A(adenine), C(cytosine), G(guanine), and T(thymine). Finding the longest common subsequence between DNA/Protein sequences is one of the basic problems in modern computational molecular biology. But this problem is a little different. Given several DNA sequences, you are asked to make a shortest sequence from them so that each of the given sequence is the subsequence of it.

For example, given "ACGT","ATGC","CGTT" and "CAGT", you can make a sequence in the following way. It is the shortest but may be not the only one.

Input

The first line is the test case number t. Then t test cases follow. In each case, the first line is an integer n ( 1<=n<=8 ) represents number of the DNA sequences. The following k lines contain the k sequences, one per line. Assuming that the length of any sequence is between 1 and 5.

Output

For each test case, print a line containing the length of the shortest sequence that can be made from these sequences.

SampleInput

1
4
ACGT
ATGC
CGTT
CAGT

SampleOutput

8

题意就是给你几个DNA序列,要求找到一个序列,使得全部序列都是它的子序列(不必定连续)。
直接搜MLE、TLE、RE,因此不能直接搜索,通常处理这种序列问题,都是把序列映射到整数或其余便于处理的东西上。
题目还说了每一个DNA的序列长度不会超过5,因此咱们能够按位处理映射到一个整数上,并且题目只须要咱们输出最短的序列长度,因此咱们也没必要去映射字符,映射长度便够了。
最多8个字符,每一个字符1-5长度,因此最大数为6^8。好为何是6^8,不明明是5^8么,这个我暂时先不解释,我加在了代码注释里。
代码:
 1 #include <iostream>
 2 #include <string>
 3 #include <cstdio>
 4 #include <cstdlib>
 5 #include <sstream>
 6 #include <iomanip>
 7 #include <map>
 8 #include <stack>
 9 #include <deque>
 10 #include <queue>
 11 #include <vector>
 12 #include <set>
 13 #include <list>
 14 #include <cstring>
 15 #include <cctype>
 16 #include <algorithm>
 17 #include <iterator>
 18 #include <cmath>
 19 #include <bitset>
 20 #include <ctime>
 21 #include <fstream>
 22 #include <limits.h>
 23 #include <numeric>
 24 
 25 using namespace std;  26 
 27 #define F first
 28 #define S second
 29 #define mian main
 30 #define ture true
 31 
 32 #define MAXN 1000000+5
 33 #define MOD 1000000007
 34 #define PI (acos(-1.0))
 35 #define EPS 1e-6
 36 #define MMT(s) memset(s, 0, sizeof s)
 37 typedef unsigned long long ull;  38 typedef long long ll;  39 typedef double db;  40 typedef long double ldb;  41 typedef stringstream sstm;  42 const int INF = 0x3f3f3f3f;  43 
 44 int t,n;  45 map<int,int>vis;  46 char s[10][10]; //保存序列  47 int len[10]; //保存每一个序列的长度  48 int p[10] = {1,6,36,216,1296,7776,46656,279936,1679616,10077696}; //6的k次方表  49 char temp[4]={'A','C','G','T'};  50 
 51 struct node{  52     int step; //长度  53     int st; //也就是映射数  54  node(){}  55     node(int _step, int _st):step(_step),st(_st){}  56 };  57 
 58 int bfs(int res){  59  vis.clear();  60     queue<node>q;  61     q.push(node(0,0));  62     vis[0] = 1;  63     while(!q.empty()){  64         node nxt,k = q.front();  65  q.pop();  66         if(k.st == res){ //当映射等于结果时 返回长度  67             return k.step;  68  }  69         for(int i = 0; i < 4; i++){  70             nxt.st = 0;  71             nxt.step = k.step+1;  72             int tp = k.st;  73             for(int j = 1; j <= n; j++){  74                 int x = tp%6; //获得位数  75                 tp /= 6;  76                 if(x == len[j] || s[j][x+1] != temp[i]){ //判断字符是否匹配  77                     nxt.st += x*p[j-1];  78  }  79                 else{  80                     nxt.st += (x+1)*p[j-1];  81  }  82  }  83             if(vis[nxt.st] == 0){ //标记是否已经搜过  84  q.push(nxt);  85                 vis[nxt.st] = 1;  86  }  87  }  88  }  89 }  90 
 91 int main(){  92     ios_base::sync_with_stdio(false);  93     cout.tie(0);  94     cin.tie(0);  95     cin>>t;  96     while(t--){  97         cin>>n;  98         int res = 0;  99         for(int i = 1; i <= n; i++){ //由于数组从0开始计数,但咱们映射以及后面操做都是基于位置,因此从1开始 100             cin>>s[i]+1; //同理从一开始 101             len[i] = strlen(s[i]+1); 102             res += len[i]*p[i-1]; //这也就是为何是6^8,由于咱们是从1开始有5个状态而不是0 103  } 104         cout << bfs(res) <<endl; 105  } 106     return 0; 107 }

因此这题你非要从0位置搞,弄5^8确实没错,也能够作出来,可是操做会繁琐不少,还不如从方便的角度多加一个长度。node


 

这道题的难度就是不知道怎么入手,即便知道转换处理也不知道该如何转换以及如何搜索,这里咱们避免了去从字符开始搜索,而是直接基于长度搜。ios

值得一提的是,我问了队友后,他们表示这道题作法不少,还能够用IDA*算法或者启发式搜索,甚至不用搜索用AC自动机加矩阵也能够作。但这些作法都是基于字符去搜索的,也不能说谁好谁坏,只是咱们的思惟就不同了,不少题目其实都不止一种解法,多想一想,颇有用的。至于其余作法我也就懒得作了(实际上是不会23333)算法

相关文章
相关标签/搜索