Tire树(字典树)

时间 2019-11-24

标签 tire 字典繁體版

原文原文链接

from:https://www.cnblogs.com/justinh/p/7716421.htmlphp

Trie，又常常叫前缀树，字典树等等。它有不少变种，如后缀树，Radix Tree/Trie，PATRICIA tree，以及bitwise版本的crit-bit tree。固然不少名字的意义其实有交叉。html

定义

在计算机科学中，trie，又称前缀树或字典树，是一种有序树，用于保存关联数组，其中的键一般是字符串。与二叉查找树不一样，键不是直接保存在节点中，而是由节点在树中的位置决定。一个节点的全部子孙都有相同的前缀，也就是这个节点对应的字符串，而根节点对应空字符串。通常状况下，不是全部的节点都有对应的值，只有叶子节点和部份内部节点所对应的键才有相关的值。ios

trie中的键一般是字符串，但也能够是其它的结构。trie的算法能够很容易地修改成处理其它结构的有序序列，好比一串数字或者形状的排列。好比，bitwise trie中的键是一串位元，能够用于表示整数或者内存地址c++

基本性质git

1，根节点不包含字符，除根节点意外每一个节点只包含一个字符。面试

2，从根节点到某一个节点，路径上通过的字符链接起来，为该节点对应的字符串。算法

3，每一个节点的全部子节点包含的字符串不相同。数组

优势：数据结构

能够最大限度地减小无谓的字符串比较，故能够用于词频统计和大量字符串排序。app

　　跟哈希表比较：

　　　　1，最坏状况时间复杂度比hash表好

　　　　2，没有冲突，除非一个key对应多个值（除key外的其余信息）

　　　　3，自带排序功能（相似Radix Sort），先序遍历trie能够获得排序。

缺点：

1，虽然不一样单词共享前缀，但其实trie是一个以空间换时间的算法。其每个字符均可能包含至多字符集大小数目的指针（不包含卫星数据）。

每一个结点的子树的根节点的组织方式有几种。1>若是默认包含全部字符集，则查找速度快但浪费空间（特别是靠近树底部叶子）。2>若是用连接法(如左儿子右兄弟)，则节省空间但查找需顺序（部分）遍历链表。3>alphabet reduction: 减小字符宽度以减小字母集个数。,4>对字符集使用bitmap，再配合连接法。

2，若是数据存储在外部存储器等较慢位置，Trie会较hash速度慢（hash访问O(1)次外存，Trie访问O(树高)）。

3，长的浮点数等会让链变得很长。可用bitwise trie改进。

bit-wise Trie

相似于普通的Trie，可是字符集为一个bit位，因此孩子也只有两个。

可用于地址分配，路由管理等。

虽然是按bit位存储和判断，但由于cache-local和可高度并行，因此性能很高。跟红黑树比，红黑树虽然纸面性能更高，可是由于cache不友好和串行运行多，瓶颈在存储访问延迟而不是CPU速度。

压缩Trie

压缩分支条件：

1，Trie基本不变

2，只是查询

3，key跟结点的特定数据无关

4，分支很稀疏

若容许添加和删除，就可能须要分裂和合并结点。此时可能须要对压缩率和更新（裂，并）频率进行折中。

外存Trie

某些变种如后缀树适合存储在外部，另外还有B-trie等。

应用场景：

（1）字符串检索
事先将已知的一些字符串（字典）的有关信息保存到trie树里，查找另一些未知字符串是否出现过或者出现频率。
举例：
1，给出N 个单词组成的熟词表，以及一篇全用小写英文书写的文章，请你按最先出现的顺序写出全部不在熟词表中的生词。
2，给出一个词典，其中的单词为不良单词。单词均为小写字母。再给出一段文本，文本的每一行也由小写字母构成。判断文本中是否含有任何不良单词。例如，若rob是不良单词，那么文本problem含有不良单词。

3，1000万字符串，其中有些是重复的，须要把重复的所有去掉，保留没有重复的字符串。

（2）文本预测、自动完成，see also，拼写检查

（3）词频统计

1，有一个1G大小的一个文件，里面每一行是一个词，词的大小不超过16字节，内存限制大小是1M。返回频数最高的100个词。

2，一个文本文件，大约有一万行，每行一个词，要求统计出其中最频繁出现的前10个词，请给出思想，给出时间复杂度分析。

3，寻找热门查询：搜索引擎会经过日志文件把用户每次检索使用的全部检索串都记录下来，每一个查询串的长度为1-255字节。假设目前有一千万个记录，这些查询串的重复度比较高，虽然总数是1千万，可是若是去除重复，不超过3百万个。一个查询串的重复度越高，说明查询它的用户越多，也就越热门。请你统计最热门的10个查询串，要求使用的内存不能超过1G。
(1) 请描述你解决这个问题的思路；
(2) 请给出主要的处理流程，算法，以及算法的复杂度。

==》若无内存限制：Trie + “k-大/小根堆”（k为要找到的数目）。

不然，先hash分段再对每个段用hash（另外一个hash函数）统计词频，再要么利用归并排序的某些特性（如partial_sort），要么利用某使用外存的方法。参考

　　“海量数据处理之归并、堆排、前K方法的应用：一道面试题” http://www.dataguru.cn/thread-485388-1-1.html。

　　“算法面试题之统计词频前k大” http://blog.csdn.net/u011077606/article/details/42640867

　　算法导论笔记——第九章中位数和顺序统计量

（4）排序

Trie树是一棵多叉树，只要先序遍历整棵树，输出相应的字符串即是按字典序排序的结果。
好比给你N 个互不相同的仅由一个单词构成的英文名，让你将它们按字典序从小到大排序输出。

（5）字符串最长公共前缀
Trie树利用多个字符串的公共前缀来节省存储空间，当咱们把大量字符串存储到一棵trie树上时，咱们能够快速获得某些字符串的公共前缀。
举例：
给出N 个小写英文字母串，以及Q 个询问，即询问某两个串的最长公共前缀的长度是多少？
解决方案：首先对全部的串创建其对应的字母树。此时发现，对于两个串的最长公共前缀的长度即它们所在结点的公共祖先个数，因而，问题就转化为了离线（Offline）的最近公共祖先（Least Common Ancestor，简称LCA）问题。
而最近公共祖先问题一样是一个经典问题，能够用下面几种方法：
1. 利用并查集（Disjoint Set），能够采用采用经典的Tarjan 算法；
2. 求出字母树的欧拉序列（Euler Sequence ）后，就能够转为经典的最小值查询（Range Minimum Query，简称RMQ）问题了；

（6）字符串搜索的前缀匹配
trie树经常使用于搜索提示。如当输入一个网址，能够自动搜索出可能的选择。当没有彻底匹配的搜索结果，能够返回前缀最类似的可能。
Trie树检索的时间复杂度能够作到n，n是要检索单词的长度，
若是使用暴力检索，须要指数级O(n²)的时间复杂度。

（7） 做为其余数据结构和算法的辅助结构
如后缀树，AC自动机等

后缀树能够用于全文搜索

转一篇关于几种Trie速度比较的文章：http://www.hankcs.com/nlp/performance-comparison-of-several-trie-tree.html

Trie树和其它数据结构的比较 http://www.raychase.net/1783

参考：

[1] 维基百科：Trie， https://en.wikipedia.org/wiki/Trie

[2] LeetCode字典树(Trie)总结， http://www.jianshu.com/p/bbfe4874f66f

[3] 字典树(Trie树)的实现及应用， http://www.cnblogs.com/binyue/p/3771040.html#undefined

[4] 6天通吃树结构—— 第五天 Trie树， http://www.cnblogs.com/huangxincheng/archive/2012/11/25/2788268.html

在Trie树中主要有3个操做，插入、查找和删除。通常状况下Trie树中不多存在删除单独某个结点的状况，所以只考虑删除整棵树。

 1 #include <stdio.h>
 2 #include <string.h>
 3 #include <algorithm>
 4 
 5 using namespace std;
 6 #define maxn 26
 7 struct Trie{
 8     int Flag;// 经过该节点单词的数量，可用来统计前缀 
 9     bool End;// 判断该节点是不是是单词的结尾
10     char str[20];//若是该节点是单词结尾(End=true) 可存放整个单词 
11     Trie *Next[maxn];
12     Trie(){        //结构体初始化 
13         Flag=1;
14         End=false;
15         memset(Next,0,sizeof(Next));
16     }
17 }*Root;
18 
19 void Insert(char *str) //插入操做 
20 {
21     Trie *p=Root;
22     Trie *q=NULL;
23     int len=strlen(str);
24     for(int i=0;i<len;i++)
25     {
26         int key=str[i]-'a'; //数字的话为 str[i]-'0' 
27         if(!p->Next[key])
28         {
29             q=new Trie(); //不能找到，说明该key没有在这条支路，新建节点 
30             p->Next[key]=q;
31             p=p->Next[key];
32         }
33         else
34         {
35             p=p->Next[key];
36             p->Flag++; //能查到,则经过该节点(key)的次数加1 
37         }
38         if(i==len-1)//到了字符串的最后一个  
39         {
40             p->End=true;
41             strcpy(p->str,str);
42         } 
43     }
44 } 
45 
46 int Qurey(char *str) //查询操做，根据查询的内容，选int bool void 
47 {
48     int len=strlen(str);
49     Trie *p=Root;
50     for(int i=0;i<len;i++)
51     {
52         int key=str[i]-'a';
53         if(!p->Next[key])
54             return 0;
55         p=p->Next[key]; 
56     }
57     return p->Flag; 
58 } 
59 
60 void Free(Trie* T) //释放字典树 
61 {
62     if(T==NULL)
63         return;
64         for(int i=0;i<maxn;i++)
65         {
66             if(T->Next[i])
67                 Free(T->Next[i]); //递归的方式 
68         }
69     delete(T);
70 }
71 
72 int main()
73 {
74     Root=new Trie();
75 //    char str[20];
76 //    gets(str);
77 //    Insert(str);
78 //    printf("%d",Qurey(str));
79     Free(Root);
80     return 0;
81 }

也能够用二维数组来表示字典树，用结构体超时的话，能够试试数组

 1 #include <stdio.h>
 2 #include <string.h>
 3 #include <algorithm>
 4 
 5 using namespace std;
 6 #define maxn 26
 7 const int MAXN=2e6+5;
 8 int Trie[MAXN][maxn]; //用一个二维数组来表示树 
 9 int Count[MAXN];      //记录频数 
10 bool End[MAXN];       //结尾标志 
11 int tot;              //总结点数 
12 
13 void Insert(char *str) // 插入操做 
14 {
15     int Root=0;
16     int len=strlen(str);
17     for(int i=0;i<len;i++)
18     {
19         int key=str[i]-'a';
20         if(!Trie[Root][key])
21         {
22             Trie[Root][key]=++tot;//至关于创建新节点 
23             Root=Trie[Root][key];
24             Count[Root]=1;
25         }
26         else
27         {
28             Root=Trie[Root][key];
29             Count[Root]++;
30         }
31     }
32     End[Root]=true;
33 } 
34 
35 bool Qurey(char *str)
36 {
37     int Root=0;
38     int len=strlen(str);
39     for(int i=0;i<len;i++)
40     {
41         int key=str[i]-'a';
42         if(!Trie[Root][key])
43             return false;
44         Root=Trie[Root][key];
45     }
46     return true;
47 }
48 
49 void Clear()
50 {
51     for(int i=0;i<=tot;i++) //注意是<=tot 
52     {
53         End[i]=false;
54         Count[i]=0;
55         for(int j=0;j<maxn;j++)
56         {
57             Trie[i][j]=0;
58         }
59     }
60     tot=0;
61 }
62 
63 int main()
64 {
65 //    char str[20];
66 //    gets(str);
67 //    Insert(str);
68 //    if(Qurey(str))
69 //        printf("Yes\n");
70 //    Clear();
71     return 0;
72 }

例题及思路

统计难题

HDU1251

Problem Description

Ignatius最近遇到一个难题,老师交给他不少单词(只有小写字母组成,不会有重复的单词出现),如今老师要他统计出以某个字符串为前缀的单词数量(单词自己也是本身的前缀).

Input

输入数据的第一部分是一张单词表,每行一个单词,单词的长度不超过10,它们表明的是老师交给Ignatius统计的单词,一个空行表明单词表的结束.第二部分是一连串的提问,每行一个提问,每一个提问都是一个字符串.

注意:本题只有一组测试数据,处理到文件结束.

Output

对于每一个提问,给出以该字符串为前缀的单词的数量.

Sample Input

banana
band
bee
absolute
acm

ba
b
band
abc

Sample Output

统计出以某个字符串为前缀的单词数量，首先构建出trie树并记录每一个节点的访问次数，而后在上面查询就行了，模板题。

 1 #include <cstdio>
 2 #include <cstring>
 3 #include <cstdlib>
 4 #include <algorithm>
 5 #define MAXN 26
 6 using namespace std;
 7 
 8 struct Trie 
 9 {
10     Trie *Next[MAXN];
11     int Flag;
12     Trie()
13     {
14         Flag=1;
15         memset(Next,NULL,sizeof(Next));
16     }
17 };
18 
19 struct Trie* Root;
20 
21 void Insert(char* str)
22 {
23     Trie *p,*q;
24     p=Root;
25     int len=strlen(str);
26     for(int i=0;i<len;++i)
27     {
28         int key=str[i]-'a';
29         if(p->Next[key]==NULL)
30         {
31             q=new Trie();
32             p->Next[key]=q;
33             p=p->Next[key];
34         }
35         else 
36         {
37             p=p->Next[key];
38             ++p->Flag;
39         }
40     }
41 }
42 
43 int Qurey(char *str)
44 {
45     int len=strlen(str);
46     Trie* p=Root;
47     for(int i=0;i<len;++i)
48     {
49         int key=str[i]-'a';
50         if(p->Next[key]==NULL)
51             return 0;
52         p=p->Next[key];
53     }
54     return p->Flag;
55     
56 }
57 
58 void Free(Trie* T)
59 {
60     if(T==NULL) return;
61     for(int i=0;i<MAXN;i++)
62     {
63         if(T->Next[i]) Free(T->Next[i]);
64     }
65     delete(T);
66 }
67 
68 int main()
69 {
70     //freopen("sample.txt","r",stdin);
71     char str[15];
72     Root=new Trie();
73     while(*gets(str))   //效果等同于while(gets(str)&&str[0]!=0)
74     {
75         Insert(str);
76     }
77     while(~scanf("%s",str))
78     {
79         printf("%d\n",Qurey(str));
80     }
81     Free(Root);
82     return 0;
83 }

View Code

单词数

HDU2072

Problem Description

lily的好朋友xiaoou333最近很空，他想了一件没有什么意义的事情，就是统计一篇文章里不一样单词的总数。下面你的任务是帮助xiaoou333解决这个问题。

Input

有多组数据，每组一行，每组就是一篇小文章。每篇小文章都是由小写字母和空格组成，没有标点符号，遇到#时表示输入结束。

Output

每组只输出一个整数，其单独成行，该整数表明一篇文章里不一样单词的总数。

Sample Input

you are my friend
#

Sample Output

统计出现的不一样单词的个数，直接把字符所有插入Trie树中，而后遍历trie树，统计全部具备End标记的节点个数就行了。

提示：多样例输入的格式是第一种，用字符数组会被卡

这里能够用 istringstream来快速从字符串中根据空格来提取单词，istringstream对象能够绑定一行字符串，而后以空格为分隔符把该行分隔开来。(要加头文件sstream)

 1 #include <cstdio>
 2 #include <iostream>
 3 #include <cstring>
 4 #include <algorithm>
 5 #include <sstream> 
 6 #define MAXN 26
 7 using namespace std;
 8 
 9 struct Trie
10 {
11     Trie *Next[MAXN];
12     int flag;
13     int end;
14     Trie()
15     {
16         flag=1;
17         end=0;
18         memset(Next,NULL,sizeof(Next));
19     }
20 }*Root;
21 
22 void Insert(string str)
23 {
24     Trie *p=Root;
25     Trie *q;
26     int len=str.size();
27     for(int i=0;i<len;++i)
28     {
29         int key=str[i]-'a';
30         if(p->Next[key]==NULL)
31         {
32             q=new Trie();
33             p->Next[key]=q;
34             p=p->Next[key];
35         }
36         else 
37         {
38             p=p->Next[key];
39             ++p->flag;
40         }
41         if(i==len-1)
42             p->end=1;
43     }
44 }
45 
46 int visit(Trie* T,int &sum) //遍历trie树
47 {
48     if(T==NULL)
49         return 0;
50     if(T->end==1)
51         sum++;
52     for(int i=0;i<MAXN;i++)
53     {
54         visit(T->Next[i],sum);
55     }
56 }
57 
58 void Free(Trie* T)
59 {
60     if(T==NULL) return;
61     for(int i=0;i<MAXN;i++)
62     {
63         if(T->Next[i]) Free(T->Next[i]);
64     }
65     delete(T);
66 }
67 
68 int main()
69 {
70     string str;
71     string tem;  //字符数组被卡，改用string就过了。。。。
72     int sum=0;
73     Root=new Trie();
74     while(getline(cin,str)&&str!="#")
75     {
76         istringstream is(str); //is至关于存单词的一个容器
77         while(is>>tem)        //将is中的字符串依次赋给string
78         {
79             Insert(tem);
80         }
81         visit(Root,sum);
82         printf("%d\n",sum);
83         Free(Root);
84         Root=new Trie();
85         sum=0;
86     }
87     Free(Root);
88     return 0;
89 }

View Code

Shortest Prefixes

POJ2001

Description

A prefix of a string is a substring starting at the beginning of the given string. The prefixes of "carbon" are: "c", "ca", "car", "carb", "carbo", and "carbon". Note that the empty string is not considered a prefix in this problem, but every non-empty string is considered to be a prefix of itself. In everyday language, we tend to abbreviate words by prefixes. For example, "carbohydrate" is commonly abbreviated by "carb". In this problem, given a set of words, you will find for each word the shortest prefix that uniquely identifies the word it represents.

In the sample input below, "carbohydrate" can be abbreviated to "carboh", but it cannot be abbreviated to "carbo" (or anything shorter) because there are other words in the list that begin with "carbo".

An exact match will override a prefix match. For example, the prefix "car" matches the given word "car" exactly. Therefore, it is understood without ambiguity that "car" is an abbreviation for "car" , not for "carriage" or any of the other words in the list that begins with "car".

Input

The input contains at least two, but no more than 1000 lines. Each line contains one word consisting of 1 to 20 lower case letters.

Output

The output contains the same number of lines as the input. Each line of the output contains the word from the corresponding line of the input, followed by one blank space, and the shortest prefix that uniquely (without ambiguity) identifies this word.

Sample Input

carbohydrate
cart
carburetor
caramel
caribou
carbonic
cartilage
carbon
carriage
carton
car
carbonate

Sample Output

carbohydrate carboh
cart cart
carburetor carbu
caramel cara
caribou cari
carbonic carboni
cartilage carti
carbon carbon
carriage carr
carton carto
car car
carbonate carbona

求一个能表明这个字符串的最短前缀，也就是只有这个字符串具备的前缀，不然输出这个字符串自己

作法很显然，先构建好Trie树，而后对每一个单词进行find，递归到直到节点出现次数为1，表示这个节点只有这一个单词走过，返回就ok。

这里能够用string不断拼接字符，而后直接返回，减小一些代码量。

 1 #include <cstdio>
 2 #include <cstring>
 3 #include <algorithm>
 4 
 5 using namespace std;
 6 #define MAXN 26
 7 const int maxn=2e6+5;
 8 int Trie[maxn][MAXN];
 9 int Count[maxn];
10 bool End[maxn];
11 int tot;
12 
13 void Find(char *str)
14 {
15     int len=strlen(str);
16     int Root=0;
17     int flag=1;
18     int i;
19     for(i=0;i<len;i++)
20     {
21         int key=str[i]-'a';
22         Root=Trie[Root][key];
23         printf("%c",str[i]);
24         if(Count[Root]==1)
25         {
26             printf("\n");
27             return;
28         }     
29     }
30     printf("\n");
31     return ;
32 }
33 
34 void Insert(char *str)
35 {
36     int len=strlen(str);
37     int Root=0;
38     for(int i=0;i<len;i++)
39     {
40         int key=str[i]-'a';
41         if(!Trie[Root][key])
42         {
43             Trie[Root][key]=++tot;
44             Count[Trie[Root][key]]=1;
45         }
46         else
47         {
48             Count[Trie[Root][key]]++;
49         }    
50         Root=Trie[Root][key];
51     }
52     End[Root]=true;
53 }
54 
55 void init()
56 {
57     for(int i=0;i<=tot;i++)
58     {
59         End[i]=false;
60         Count[i]=0;
61         for(int j=0;j<MAXN;j++)
62         {
63             Trie[i][j]=0;
64         }
65     }
66     tot=0;
67 }
68 char ss[1005][21];
69 int main()
70 {
71     int num=0;
72     while(scanf("%s",ss[num])!=EOF)
73     {
74         Insert(ss[num++]);
75     }
76     for(int i=0;i<num;i++)
77     {
78         printf("%s ",ss[i]);
79         Find(ss[i]);
80     }
81     init();
82     return 0;
83 }

View Code

Phone List

POJ3630

Description

Given a list of phone numbers, determine if it is consistent in the sense that no number is the prefix of another. Let's say the phone catalogue listed these numbers:

Emergency 911
Alice 97 625 999
Bob 91 12 54 26

In this case, it's not possible to call Bob, because the central would direct your call to the emergency line as soon as you had dialled the first three digits of Bob's phone number. So this list would not be consistent.

Input

The first line of input gives a single integer, 1 ≤ t ≤ 40, the number of test cases. Each test case starts with n, the number of phone numbers, on a separate line, 1 ≤ n ≤ 10000. Then follows n lines with one unique phone number on each line. A phone number is a sequence of at most ten digits.

Output

For each test case, output "YES" if the list is consistent, or "NO" otherwise.

Sample Input

Sample Output

NO
YES

给出一个字符串集合，问是否全部的字符串都不是其余字符串的前缀，

若是字符串Xn=X1X2....Xn是字符串Ym=Y1Y2....Ym的前缀，有在插入的时候有两种状况:Xn在Yn以前插入，Xn在Yn以后插入。

1)若是Xn在Yn以前插入，那么在插入Yn的时候必然通过Xn的路径，此时能够根据判断在这条路径上是否已经有结点被标记已经构成完成的字符串序列来判断是否存在Yn的前缀；

2)若是Xn在Yn以后插入，那么插入完成以后当前指针指向的结点的next数组中的元素一定不全为NULL或0。

以前一直时间超限。。。。把链式结构换成了二维数组

而后这个居然wa掉了，

 1 #include <cstdio>
 2 #include <cstring>
 3 #include <algorithm>
 4 
 5 using namespace std;
 6 #define MAXN 10
 7 const int maxn=2e6+5;
 8 int Trie[maxn][MAXN];
 9 bool End[maxn];
10 int tot;
11 
12 void init()
13 {
14     for(int i=0;i<tot;i++)
15     {
16         End[i]=false;
17         for(int j=0;j<MAXN;j++)
18         {
19             Trie[i][j]=0;
20         }
21     }
22     tot=0;
23 }
24 
25 int main()
26 {
27     int t;
28     scanf("%d",&t);
29     while(t--)
30     {
31         int n;
32         scanf("%d",&n);
33         getchar();
34         int flag=1; 
35         while(n--)
36         {
37             char str[20];
38             gets(str);
39             int len=strlen(str);
40             int Root=0;
41             for(int i=0;i<len;i++)
42             {
43                 int key=str[i]-'0';
44                 if(!Trie[Root][key])
45                     Trie[Root][key]=++tot;
46                 else if(End[Trie[Root][key]]==true)
47                 {
48                     flag=0;
49                 }
50                 Root=Trie[Root][key];
51                 if(i==len-1)
52                     End[Root]=true;
53             }
54             for(int i=0;i<MAXN;i++)
55             {
56                 if(Trie[Root][i]!=0)
57                 {
58                     flag=0;
59                     break;
60                 }
61             }
62         }
63         if(flag)
64             printf("YES\n");
65         else 
66             printf("NO\n");
67         init();
68     }
69     return 0;
70 }

View Code

后来终于过了，原来是init函数的问题，把i < tot改成了i <= tot就AC了，卡了我好长时间

 1 #include <cstdio>
 2 #include <cstring>
 3 #include <algorithm>
 4 
 5 using namespace std;
 6 #define MAXN 10
 7 const int maxn=2e6+5;
 8 int Trie[maxn][MAXN];
 9 bool End[maxn];
10 int tot;
11 
12 void init()
13 {
14     for(int i=0;i<=tot;i++)   //这里把<改成了<=就AC了
15     {
16         End[i]=false;
17         for(int j=0;j<MAXN;j++)
18         {
19             Trie[i][j]=0;
20         }
21     }
22     tot=0;
23 }
24 
25 int main()
26 {
27     //freopen("sampletem.txt","r",stdin);
28     int t;
29     scanf("%d",&t);
30     while(t--)
31     {
32         int n;
33         scanf("%d",&n);
34         getchar();
35         int flag=1; 
36         while(n--)
37         {
38             char str[20];
39             gets(str);
40             int len=strlen(str);
41             int Root=0;
42             for(int i=0;i<len;i++)
43             {
44                 int key=str[i]-'0';
45                 if(!Trie[Root][key])
46                     Trie[Root][key]=++tot;
47                 else if(End[Trie[Root][key]]==true)
48                 {
49                     flag=0;
50                 }
51                 Root=Trie[Root][key];
52                 if(i==len-1)
53                     End[Root]=true;
54             }
55             for(int i=0;i<MAXN;i++)
56             {
57                 if(Trie[Root][i]!=0)
58                 {
59                     flag=0;
60                     break;
61                 }
62             }
63         }
64         if(flag)
65             printf("YES\n");
66         else 
67             printf("NO\n");
68         init();
69     }
70     return 0;
71 }

View Code

T9

POJ1451

Description

Background

A while ago it was quite cumbersome to create a message for the Short Message Service (SMS) on a mobile phone. This was because you only have nine keys and the alphabet has more than nine letters, so most characters could only be entered by pressing one key several times. For example, if you wanted to type "hello" you had to press key 4 twice, key 3 twice, key 5 three times, again key 5 three times, and finally key 6 three times. This procedure is very tedious and keeps many people from using the Short Message Service.

This led manufacturers of mobile phones to try and find an easier way to enter text on a mobile phone. The solution they developed is called T9 text input. The "9" in the name means that you can enter almost arbitrary words with just nine keys and without pressing them more than once per character. The idea of the solution is that you simply start typing the keys without repetition, and the software uses a built-in dictionary to look for the "most probable" word matching the input. For example, to enter "hello" you simply press keys 4, 3, 5, 5, and 6 once. Of course, this could also be the input for the word "gdjjm", but since this is no sensible English word, it can safely be ignored. By ruling out all other "improbable" solutions and only taking proper English words into account, this method can speed up writing of short messages considerably. Of course, if the word is not in the dictionary (like a name) then it has to be typed in manually using key repetition again.

Figure 8: The Number-keys of a mobile phone.

More precisely, with every character typed, the phone will show the most probable combination of characters it has found up to that point. Let us assume that the phone knows about the words "idea" and "hello", with "idea" occurring more often. Pressing the keys 4, 3, 5, 5, and 6, one after the other, the phone offers you "i", "id", then switches to "hel", "hell", and finally shows "hello".

Problem

Write an implementation of the T9 text input which offers the most probable character combination after every keystroke. The probability of a character combination is defined to be the sum of the probabilities of all words in the dictionary that begin with this character combination. For example, if the dictionary contains three words "hell", "hello", and "hellfire", the probability of the character combination "hell" is the sum of the probabilities of these words. If some combinations have the same probability, your program is to select the first one in alphabetic order. The user should also be able to type the beginning of words. For example, if the word "hello" is in the dictionary, the user can also enter the word "he" by pressing the keys 4 and 3 even if this word is not listed in the dictionary.

Input

The first line contains the number of scenarios.
Each scenario begins with a line containing the number w of distinct words in the dictionary (0<=w<=1000). These words are given in the next w lines. (They are not guaranteed in ascending alphabetic order, although it's a dictionary.) Every line starts with the word which is a sequence of lowercase letters from the alphabet without whitespace, followed by a space and an integer p, 1<=p<=100, representing the probability of that word. No word will contain more than 100 letters.
Following the dictionary, there is a line containing a single integer m. Next follow m lines, each consisting of a sequence of at most 100 decimal digits 2-9, followed by a single 1 meaning "next word".

Output

The output for each scenario begins with a line containing "Scenario #i:", where i is the number of the scenario starting at 1.
For every number sequence s of the scenario, print one line for every keystroke stored in s, except for the 1 at the end. In this line, print the most probable word prefix defined by the probabilities in the dictionary and the T9 selection rules explained above. Whenever none of the words in the dictionary match the given number sequence, print "MANUALLY" instead of a prefix.
Terminate the output for every number sequence with a blank line, and print an additional blank line at the end of every scenario.

Sample Input

2
5
hell 3
hello 4
idea 8
next 8
super 3
2
435561
43321
7
another 5
contest 6
follow 3
give 13
integer 6
new 14
program 4
5
77647261
6391
4681
26684371
77771

Sample Output

Scenario #1:
i
id
hel
hell
hello

i
id
ide
idea


Scenario #2:
p
pr
pro
prog
progr
progra
program

n
ne
new

g
in
int

c
co
con
cont
anoth
anothe
another

p
pr
MANUALLY
MANUALLY

Hat’s Words

HDU1247

Problem Description

A hat’s word is a word in the dictionary that is the concatenation of exactly two other words in the dictionary.
You are to find all the hat’s words in a dictionary.

Input

Standard input consists of a number of lowercase words, one per line, in alphabetical order. There will be no more than 50,000 words.
Only one case.

Output

Your output should contain all the hat’s words, one per line, in alphabetical order.

Sample Input

a
ahat
hat
hatword
hziee
word

Sample Output

ahat
hatword

给出n个按字典序排列的单词，问其中的全部能由另外两个单词组成的单词并按字典序输出

(推荐)思路1：先建好字典树，而后枚举一个单词将其拆分红两个单词并查询字典树中是否有这两个单词；

思路2：先把全部的单词构形成一颗trie图，而后对全部的单词进行枚举，在trie图上面判断一个单词是否由其它两个单词构成，具备的作法是先沿着路径一直走，若是走到某个节点，该节点为一个单词的结尾，那么再对剩余的单词再从trie图的根开始遍历，看是否能和一个单词匹配，若匹配成功则该单词知足要求，不然继续进行匹配...

思路3：能够建两颗Trie树，而后分别正序倒序插入每一个单词，对每一个单词查询的时候，分别正序倒序查询，对出现过单词的前缀下标进行标记，对每一个出现过单词的后缀进行标记，最后扫描标记数组，若是某个位置前缀后缀均被标记过，则表示能够拆成单词表中的两个其余单词。

没有想到该怎样枚举，只想选一个单词另外两个怎么选，该弄多少个循环呀，看了别人的方法才知道将这个单词拆分红两个单词；还了解了strncpy，一些字符函数能用则用，原本就是为了方便！

  1 #include <cstdio>
  2 #include <iostream>
  3 #include <string>
  4 #include <string.h>
  5 #include <stdlib.h>
  6 #include <algorithm>
  7 using namespace std;
  8 #define MAXN 26
  9 
 10 char word[50005][50];
 11 char ss1[50],ss2[50];
 12 int num;
 13 
 14 struct Trie{
 15     int Flag; 
 16     Trie *Next[MAXN];
 17     Trie()
 18     {
 19         Flag=0;
 20         memset(Next,NULL,sizeof(Next));
 21     }
 22 }*Root;
 23 
 24 void Insert(char *str)
 25 {
 26     Trie *p=Root;
 27     Trie *q=NULL;
 28     int len=strlen(str);
 29     for(int i=0;i<len;i++)
 30     {
 31         int key=str[i]-'a';
 32         if(!p->Next[key])
 33         {
 34             q=new Trie();
 35             p->Next[key]=q;
 36         }
 37         p=p->Next[key];
 38         if(i==len-1)
 39         p->Flag=1;
 40     }
 41     return;
 42 } 
 43 
 44 int Qurey(char *str)
 45 {
 46     Trie *p=Root;
 47     int len=strlen(str);
 48     int flag=0;
 49     for(int i=0;i<len;i++)
 50     {
 51         int key=str[i]-'a';
 52         if(!p->Next[key])
 53         {
 54             return 0;
 55         }
 56         p=p->Next[key];
 57     }
 58     return p->Flag;
 59 }
 60 
 61 void Find(char *str)
 62 {
 63     int len=strlen(str);
 64     for(int m=1;m<len;m++)
 65     {
 66         //这里用strncmp 
 67         strncpy(ss1,str,m);
 68         ss1[m]=0;
 69         strncpy(ss2,str+m,len-m+1);
 70         ss2[len-m+1]=0;
 71 //      不用strncmp的话 
 72 //        int j=0;
 73 //        for(int i=0;i<m;i++)
 74 //        {
 75 //            ss1[j++]=str[i];
 76 //        }
 77 //        ss1[j]=0;
 78 //        j=0;
 79 //        for(int i=m;i<len;i++)
 80 //        {
 81 //            ss2[j++]=str[i];
 82 //        }
 83 //        ss2[j]=0;
 84         //printf("ss1=%s ss2=%s\n",ss1,ss2);
 85         if(Qurey(ss1)&&Qurey(ss2))
 86         {
 87             puts(str);
 88             break;   //不带break;就wa掉了，无语 
 89         }
 90     }
 91 }
 92 
 93 void Free(Trie *T)
 94 {
 95     if(T==NULL) return;
 96     for(int i=0;i<MAXN;i++)
 97     {
 98         if(T->Next[i]) Free(T->Next[i]);
 99     }
100     delete(T);
101 }
102 
103 int main()
104 {
105     freopen("sampletem.txt","r",stdin);
106     char str[50];
107     Root=new Trie();
108     while(~scanf("%s",str))
109     {
110         strcpy(word[num++],str);
111         Insert(str);
112     }
113     for(int i=0;i<num;i++)
114     {
115         Find(word[i]);
116     }
117     Free(Root);
118     return 0;
119 }

View Code

Anagram Groups

POJ2408

World-renowned Prof. A. N. Agram's current research deals with large anagram groups. He has just found a new application for his theory on the distribution of characters in English language texts. Given such a text, you are to find the largest anagram groups.
A text is a sequence of words. A word w is an anagram of a word v if and only if there is some permutation p of character positions that takes w to v. Then, w and v are in the same anagram group. The size of an anagram group is the number of words in that group. Find the 5 largest anagram groups.

Input

The input contains words composed of lowercase alphabetic characters, separated by whitespace(or new line). It is terminated by EOF. You can assume there will be no more than 30000 words.

Output

Output the 5 largest anagram groups. If there are less than 5 groups, output them all. Sort the groups by decreasing size. Break ties lexicographically by the lexicographical smallest element. For each group output, print its size and its member words. Sort the member words lexicographically and print equal words only once.

Sample Input

undisplayed
trace
tea
singleton
eta
eat
displayed
crate
cater
carte
caret
beta
beat
bate
ate
abet

Sample Output

Group of size 5: caret carte cater crate trace .
Group of size 4: abet bate beat beta .
Group of size 4: ate eat eta tea .
Group of size 1: displayed .
Group of size 1: singleton .

把能够经过从新排列变成相同单词的单词放入一个集合，最后按照集合元素由多到少输出前五个集合，若是集合元素相同，按照字典序由小到大输出

把每个字符串，升序排序，能够获得一个字符串，若是两个字符串的字典序最小的字符串相同，就属于一个组，因此用字典树记录这个最小字典序的字符串，而后映射下标到group结构体（存各类字符串，及个数，因为字符串输出要去重，因此用set，这样函数传参数要引用传递，不然很慢），最后按照个数排序。

无论是用链式仍是顺序，一直超时，感受是算法太耗时

  1 #include <cstdio>
  2 #include <iostream>
  3 #include <string>
  4 #include <string.h>
  5 #include <set>
  6 #include <stdlib.h>
  7 #include <algorithm>
  8 using namespace std;
  9 #define MAXN 26
 10 const int maxn=1e6+5;
 11 int Trie[maxn][MAXN];
 12 int tot;
 13 //bool Flag[maxn];
 14 int To[maxn];
 15 int num;
 16 
 17 struct Group{
 18     int size;
 19     set<string> st;
 20 }group[50005];
 21 
 22 bool cmp(const Group &a,const Group &b)
 23 {
 24     if(a.size!=b.size)
 25         return a.size>b.size;
 26     else 
 27         return *(a.st.begin()) < *(b.st.begin());
 28 }
 29 
 30 void Insert(char *str)
 31 {
 32     int Root=0;
 33     string ss="";
 34     ss=str;
 35     int len=strlen(str);
 36 //    for(int i=0;i<len;i++)
 37 //    {
 38 //        ss+=str[i];
 39 //    }
 40     sort(str,str+len);
 41     //puts(str);
 42 //    int t=0;
 43     for(int i=0;i<len;i++)
 44     {
 45         int key=str[i]-'a';
 46         if(!Trie[Root][key])
 47         {
 48         //    t=1;
 49             Trie[Root][key]=++tot;
 50         }
 51         Root=Trie[Root][key];
 52 //        if(t&&i==len-1)
 53 //        {
 54 //            Flag[Root]=true;
 55 //            To[Root]=num;
 56 //            group[num].size++;
 57 //            group[num].st.insert(ss);
 58 //            num++;
 59 //            //printf("%d\n",num);
 60 //        }
 61     }
 62     if(!To[Root])
 63     {
 64         To[Root]=num++;
 65     }
 66     group[To[Root]].size++;
 67     group[To[Root]].st.insert(ss);
 68 //    if(!t&&Flag[Root]==true)
 69 //    {
 70 //        group[To[Root]].size++;
 71 //        group[To[Root]].st.insert(ss);
 72 //    }
 73     return;
 74 } 
 75 
 76 //void init()
 77 //{
 78 //    for(int i=0;i<=tot;i++)
 79 //    {
 80 //        To[i]=0;
 81 //        for(int j=0;j<MAXN;j++)
 82 //        {
 83 //            Trie[i][j]=0;
 84 //        }
 85 //    }
 86 //    tot=0;
 87 //}
 88 
 89 int main()
 90 {
 91     //freopen("sampletem.txt","r",stdin);
 92     char str[50];
 93     while(gets(str))
 94     {
 95         Insert(str);
 96     }
 97     sort(group,group+num,cmp);
 98     int n;
 99     if(num>=5)
100         n=5;
101     else    n=num;
102     for(int i=0;i<n;i++)
103     {
104         printf("Group of size %d: ",group[i].size);
105         set<string>::iterator it;
106         for(it=group[i].st.begin();it!=group[i].st.end();it++)
107         {
108             printf("%s ",(*it).c_str());
109         }
110         printf(".\n");
111     }
112     //init();
113     return 0;
114 }

View Code

数组的方法，换个G++编译就过了。。。。

结构体的换了语言也是超时

  1 #include <cstdio>
  2 #include <iostream>
  3 #include <string>
  4 #include <string.h>
  5 #include <set>
  6 #include <stdlib.h>
  7 #include <algorithm>
  8 using namespace std;
  9 #define MAXN 26
 10 int num;
 11 struct Group{
 12     int size;
 13     set<string> st;
 14 }group[50005];
 15 
 16 struct Trie{ 
 17     Trie *Next[MAXN];
 18     int To;
 19     Trie()
 20     {
 21         To=0;
 22         memset(Next,NULL,sizeof(Next));
 23     }
 24 }*Root;
 25 
 26 bool cmp(Group a,Group b)
 27 {
 28     if(a.size!=b.size)
 29         return a.size>b.size;
 30     else 
 31         return *(a.st.begin()) < *(b.st.begin());
 32 }
 33 
 34 void Insert(char *str)
 35 {
 36     Trie *p=Root;
 37     Trie *q=NULL;
 38     string ss="";
 39     int len=strlen(str);
 40     for(int i=0;i<len;i++)
 41     {
 42         ss+=str[i];
 43     }
 44     sort(str,str+len);
 45     //puts(str);
 46     for(int i=0;i<len;i++)
 47     {
 48         int key=str[i]-'a';
 49         if(!p->Next[key])
 50         {
 51             q=new Trie();
 52             p->Next[key]=q;
 53         }
 54         p=p->Next[key];
 55     }
 56     if(p->To==0)
 57     {
 58         p->To=num++;
 59     }
 60     group[p->To].size++;
 61     group[p->To].st.insert(ss);
 62     return;
 63 } 
 64 
 65 void Free(Trie *T)
 66 {
 67     if(T==NULL) return;
 68     for(int i=0;i<MAXN;i++)
 69     {
 70         if(T->Next[i]) Free(T->Next[i]);
 71     }
 72     delete(T);
 73 }
 74 
 75 int main()
 76 {
 77     freopen("sampletem.txt","r",stdin);
 78     char str[50];
 79     Root=new Trie();
 80     while(gets(str))
 81     {
 82         Insert(str);
 83     }
 84     sort(group,group+num,cmp);
 85     int n;
 86     if(num>=5)
 87         n=5;
 88     else    n=num;
 89     for(int i=0;i<n;i++)
 90     {
 91         printf("Group of size %d: ",group[i].size);
 92         set<string>::iterator it;
 93         for(it=group[i].st.begin();it!=group[i].st.end();it++)
 94         {
 95             printf("%s ",(*it).c_str());
 96         }
 97         printf(".\n");
 98     }
 99     Free(Root);
100     return 0;
101 }

View Code

过不了啊~~~~~，超时什么鬼，不想再用tire树了。。

同窗用map的作法

 1 //MADE BY Y_is_sunshine;
 2 //#include <bits/stdc++.h>
 3 //#include <memory.h>
 4 #include <algorithm>
 5 #include <iostream>
 6 #include <cstdlib>
 7 #include <cstring>
 8 #include <sstream>
 9 #include <cstdio>
10 #include <vector>
11 #include <string>
12 #include <cmath>
13 #include <queue>
14 #include <stack>
15 #include <map>
16 #include <set>
17 
18 #define MAXN 26
19 
20 using namespace std;
21 
22 int main()
23 {
24     //freopen("data.txt", "r", stdin);
25 
26     map<string, int> mp1;
27     map<string, set<string>> mp2;
28 
29     string s1;
30     while (cin >> s1) {
31         string s2;
32         s2 = s1;
33         sort(s2.begin(), s2.end());
34         mp1[s2]++;
35         mp2[s2].insert(s1);
36     }
37 
38     int T = 5;
39     while (T--) {
40         int maxn = -1;
41         string s1, s2;
42         for (map<string, int>::iterator it1 = mp1.begin(); it1 != mp1.end(); it1++) {
43             if (it1->second > maxn) {
44                 maxn = it1->second;
45                 s2 = it1->first;
46             }
47             else if (it1->second == maxn) {
48                 string s3, s4;
49                 s3 = *mp2[s2].begin();
50                 s4 = *mp2[it1->first].begin();
51                 if (s3 > s4) {
52                     maxn = it1->second;
53                     s2 = it1->first;
54                 }
55             }
56         }
57         if (mp1[s2]) {
58             cout << "Group of size " << mp1[s2] << ": ";
59             mp1[s2] = 0;
60             for (set<string>::iterator it1 = mp2[s2].begin(); it1 != mp2[s2].end(); it1++)
61                 cout << *it1 << ' ';
62             cout << "." << endl;
63         }
64     }
65 
66     //freopen("CON", "r", stdin);
67     //system("pause");
68     return 0;
69 }

View Code

What Are You Talking About

HDU1075

Problem Description

Ignatius is so lucky that he met a Martian yesterday. But he didn't know the language the Martians use. The Martian gives him a history book of Mars and a dictionary when it leaves. Now Ignatius want to translate the history book into English. Can you help him?

Input

The problem has only one test case, the test case consists of two parts, the dictionary part and the book part. The dictionary part starts with a single line contains a string "START", this string should be ignored, then some lines follow, each line contains two strings, the first one is a word in English, the second one is the corresponding word in Martian's language. A line with a single string "END" indicates the end of the directory part, and this string should be ignored. The book part starts with a single line contains a string "START", this string should be ignored, then an article written in Martian's language. You should translate the article into English with the dictionary. If you find the word in the dictionary you should translate it and write the new word into your translation, if you can't find the word in the dictionary you do not have to translate it, and just copy the old word to your translation. Space(' '), tab('\t'), enter('\n') and all the punctuation should not be translated. A line with a single string "END" indicates the end of the book part, and that's also the end of the input. All the words are in the lowercase, and each word will contain at most 10 characters, and each line will contain at most 3000 characters.

Output

In this problem, you have to output the translation of the history book.

Sample Input

START
from fiwo
hello difh
mars riwosf
earth fnnvk
like fiiwj
END
START
difh, i'm fiwo riwosf.
i fiiwj fnnvk!
END

Sample Output

hello, i'm from mars.
i like earth!

Hint

Huge input, scanf is recommended.

本题的题意是给你火星文与地球文的映射方式，而后给你一个火星文组成的文本，若某单词在映射文本中出现过，则输出映射以后的文本。不然输出原文本。

字典树，经典题，字典翻译。

　　WA了1次，后来发现是我把要翻译的单词搞错了，若是一个能够翻译的较长的单词包含着另外一个能够翻译的较短的单词，这样遍历句子的时候就会先碰到较短的单词，WA的程序会先把这个短单词翻译出来，但实际上这是不对的，应该翻译整个较长的单词，而不是遇到什么就翻译什么。

　　个人程序运行时间是281MS，用链表作的，用数组应该会快些。

　　题意：

　　只有一组测试数据，分红两部分。 第一部分是字典，给你若干个单词以及对应的翻译，以START开始，END结束； 第二部分是翻译部分，给你若干句子，要求你根据上面给出的字典，将火星文翻译成英文，以START开始，END结束。这里翻译的时候，要注意只有字典中有对应翻译的单词才翻译，字典中没有对应翻译的单词以及标点符号空格原样输出。

　　思路：

　　将字典中的火星文单词，也就是右边的单词构形成一棵字典树。在单词结束的节点中存储其对应的英文翻译。输入的时候读取整行，而后遍历每个字符，将全部字母连续的记录到一个字符数组中，直到遇到一个非英文字母的字符为止，这个时候从字典树中查找这个单词有没有对应的英文翻译，若是有，输出这个翻译，若是没有，输出原来的单词。

　　注意：

　　1.输入方法，使用的scanf+gets。

　　2.不要搞错要翻译的单词，遇到一个非字母的字符算一个单词。

　　3.尽可能不要使用cin，cout，容易超时

  1 #include <cstdio>
  2 #include <iostream>
  3 #include <string>
  4 #include <string.h>
  5 #include <stdlib.h>
  6 #include <algorithm>
  7 using namespace std;
  8 #define MAXN 26
  9 string fin="";
 10 
 11 struct Trie{
 12     int Flag; 
 13     Trie *Next[MAXN];
 14     string chc;
 15     Trie()
 16     {
 17         Flag=0;
 18         memset(Next,NULL,sizeof(Next));
 19     }
 20 }*Root;
 21 
 22 void Insert(string str,string ss)
 23 {
 24     Trie *p=Root;
 25     Trie *q=NULL;
 26     int len=str.size();
 27     for(int i=0;i<len;i++)
 28     {
 29         int key=str[i]-'a';
 30         if(!p->Next[key])
 31         {
 32             q=new Trie();
 33             p->Next[key]=q;
 34         }
 35         p=p->Next[key];
 36         if(i==len-1)
 37         p->Flag=1;
 38         p->chc=ss;
 39     }
 40     return;
 41 } 
 42 
 43 void Qurey(string str)
 44 {
 45     Trie *p=Root;
 46     int len=str.size();
 47     int flag=0;
 48     for(int i=0;i<len;i++)
 49     {
 50         int key=str[i]-'a';
 51         if(!p->Next[key])
 52         {
 53             flag=1;
 54             break;
 55         }
 56         else
 57             p=p->Next[key];
 58         if(i==len-1&&p->Flag==1)
 59             fin+=p->chc; 
 60     }
 61     if(flag||p->Flag!=1)
 62         fin+=str;
 63     return;
 64 }
 65 
 66 int main()
 67 {
 68     //freopen("sampletem.txt","r",stdin);
 69     string a,b;
 70     Root=new Trie();
 71     cin>>a;
 72     while(cin>>a)
 73     {
 74         if(a[0]=='E')
 75             break;
 76         cin>>b;
 77         Insert(b,a);
 78     }
 79     cin>>a;
 80     getchar();
 81     char c;
 82     string tem="";
 83     while(c=getchar())
 84     {
 85         if(c=='E')
 86         {
 87             c=getchar();
 88             c=getchar();
 89             break;
 90         }
 91         else if(c>='a'&&c<='z')
 92         {
 93             tem+=c;
 94         }
 95         else 
 96         {
 97             if(tem!="")
 98                 Qurey(tem);
 99             fin+=c;
100             tem="";
101         }
102     }
103     cout<<fin;
104     return 0;
105 }

View Code