字典树（trie树

时间 2019-11-13

标签字典 trie 繁體版

原文原文链接

字典树：node

大意：以消耗内存为代价去节约时间。利用字符串的公共前缀来节约存储空间。相对来讲,Trie树是一种比较简单的数据结构.理解起来比较简单,正所谓简单的东西也得付出代价.故Trie树也有它的缺点,Trie树的内存消耗很是大。git

主要应用：统计和排序大量的字符串（但不只限于字符串），因此常常被搜索引擎系统用于文本词频统计web

例子：算法

给你100000个长度不超过10的单词。对于每个单词，咱们要判断他出没出现过，若是出现了，第一次出现第几个位置。
若是咱们用最傻的方法，对于每个单词，咱们都要去查找它前面的单词中是否有它。那么这个算法的复杂度就是O(n^2)。显然对于100000的范围难以接受。如今咱们换个思路想。假设我要查询的单词是abcd，那么在他前面的单词中，以b，c，d，f之类开头的我显然没必要考虑。而只要找以a开头的中是否存在abcd就能够了。一样的，在以a开头中的单词中，咱们只要考虑以b做为第二个字母的……这样一个树的模型就渐渐清晰了……数组

例题：数据结构

Phone List

Time Limit: 3000/1000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Others)
Total Submission(s): 5081 Accepted Submission(s): 1714

this

Problem Description搜索引擎

Given a list of phone numbers, determine if it is consistent in the sense that no number is the prefix of another. Let’s say the phone catalogue listed these numbers:
1. Emergency 911
2. Alice 97 625 999
3. Bob 91 12 54 26
In this case, it’s not possible to call Bob, because the central would direct your call to the emergency line as soon as you had dialled the first three digits of Bob’s phone number. So this list would not be consistent.spa

Inputcode

The first line of input gives a single integer, 1 <= t <= 40, the number of test cases. Each test case starts with n, the number of phone numbers, on a separate line, 1 <= n <= 10000. Then follows n lines with one unique phone number on each line. A phone number is a sequence of at most ten digits.

Output

For each test case, output “YES” if the list is consistent, or “NO” otherwise.

Sample Input

2 3 911 97625999 91125426 5 113 12340 123440 12345 98346

code：

#include<stdio.h>
#include<string.h>
typedef struct node
{
    int num;    //标记该字符是不是某一字符串的结尾
    struct node *next[10];
}node;
node memory[1000000];
int k;
int insert(char *s,node *T)
{
    int i,len,id,j;
    node *p,*q;
    p=T;
    len=strlen(s);
    for(i=0;i<len;++i)
    {
        id=s[i]-'0';
        if(p->num==1)   //说明存在先前字符能够做为s的前缀----（先短后长）
            return 1;
        if(p->next[id]==NULL)
        {
            q=&memory[k++];
            q->num=0;
            for(j=0;j<10;++j)
                q->next[j]=NULL;
            p->next[id]=q;
        }
        p=p->next[id];
    }
    for(i=0;i<10;++i)      //若是p的后继结点不为空的话说明s时先前字符的前缀----（先长后短）
        if(p->next[i]!=NULL)
            return 1;
    p->num=1;
    return 0;
}
int main()
{
    int m,n,flag,i;
    node *T;
    char s[15];
    scanf("%d",&m);
    while(m--)
    {
        k=0;          //每次都从数组下标为0的地方开始分配内存，可使内存循环利用，从而不会形成内存超限
        T=&memory[k++];
        T->num=0;
        for(i=0;i<10;++i)
            T->next[i]=NULL;
        flag=0;
        scanf("%d",&n);
        while(n--)
        {
            scanf("%s",s);
            if(flag)
                continue;
            if(insert(s,T))
                flag=1;
        }
        if(flag)
            printf("NO\n");
        else
            printf("YES\n");
    }
    return 0;
}

字典树：

三个基本性质：

1. 根结点不包含字符，除根结点外每个结点都只包含一个字符。

2. 从根结点到某一结点，路径上通过的字符链接起来，为该结点对应的字符串。

3. 每一个结点的全部子结点包含的字符都不相同。

优势：利用字符串的公共前缀来节约存储空间,最大限度地减小无谓的字符串比较，查询效率比哈希表高。

缺点：若是存在大量字符串且这些字符串基本没有公共前缀，则相应的trie树将很是消耗内存。