对于kmp求next数组的理解

时间 2019-12-14

标签对于 kmp 数组理解繁體版

原文原文链接

首先附上代码ios

 1 void GetNext(char* p,int next[])  
 2 {  
 3     int pLen = strlen(p);  
 4     next[0] = -1;  
 5     int k = -1;  
 6     int j = 0;  
 7     while (j < pLen - 1)  
 8     {  
 9         //p[k]表示前缀，p[j]表示后缀  
10         if (k == -1 || p[j] == p[k])   
11         {  
12             ++k;  
13             ++j;  
14             next[j] = k;  
15         }  
16         else   
17         {  
18             k = next[k];  
19         }  
20     }  
21 }

首先咱们得明白，next[j]是表示除了p[j]以外，0~j-1这个串，前缀和后缀的最大匹配长度api

由于咱们下标是从0开始，因此这个最大匹配长度也就是，知足最长的前缀和后缀匹配后的前缀的下一个位置，数组

因此next数组知足以下性质app

对于0~jide

p[j-1]=p[next[j]-1],为何呢，由于next[j]表示，和以p[j-1]为最后一个字符的后缀匹配的最长前缀的下一个位置，ui

那么0~next[j]这个串最后的位置的前一个位置必定和后缀的最后一个位置p[j-1]匹配，因此p[next[j]-1]=p[j-1]this

那么对于0~next[j]idea

p[next[j]-1]=p[next[next[j]]-1]spa

next[next[j]]表示，和以p[next[j]-1]为最后一个字符的后缀匹配后的最长前缀的下一个位置，（最长的意思就是下一个位置必定不匹配，若是下一个位置匹配，那么这个就不是最长），那么对于0~next[next[j]],它最后一个位置的前一个位置必定和后缀的最后一个位置p[next[j]-1]相等设计

因而

p[next[next[j]-1]]=p[next[j]-1]

因而咱们能推出什么呢

p[j-1]=p[next[j]-1]=p[next[next[j]]-1]

也就是说记最初的位置为j,迭代若干次next数组(j=next[j])获得j'

必定知足p[j-1]=p[j'-1]

另外对于上面的代码，首先咱们分析每一个位置的状态

若是它的位置为0，那么当不匹配的时候它将陷入一个自环，不断地只能和第一个0位置匹配，因此咱们设计

0->-1->0 这样的自环状态

那么对于位置不为0的地方，有两种状况

找不到相同的前缀和后缀，若是当前的位置为j,则说明此时没有能匹配的知足以p[j-1]为最后一个字符的后缀，那么怎么办呢，那就只能暴力回溯

到第0个位置

能找到相同的前缀后缀

　　　　　　　　　　此时又分两种状况，能匹配上

　　　　　　　　　　若是能匹配上则，下一个位置的next就等于如今的k+1,由于此时的k其实等于next[j],若是p[j]==p[next[j]],那么next[j+1]=next[j]+1

　　　　　　　　　　由于next[j]没有算p[j],此时若是再次匹配则说明先后缀公共长多应当多加上表明新加进来的字符p[j]的一个长度

　　　　　　　　　　详细一点说就是当前的最长匹配前缀从0~next[j]-1,变成了0~next[j]

　　　　　　　　　　不能匹配上

　　　　　　　　　　咱们令j=next[j],由上面的结论推知p[j-1]=p[j'-1],因此咱们能获得长度不断递减的，能匹配以p[j-1]为最后一个字符的后缀

　　　　　　　　　　若是k回溯到-1,那说明确实没有相同的先后缀，放弃以前保留的匹配长度，直接将位置回溯到0

实际上不省略内在逻辑的求Next数组的程序是

void initNext(){
    Next[1]=0;
    int p1=0,p2=1;
    while(p2<=m){
        if(p1==0){
            //此时说明它连第一个字符都匹配不了，那么后续的匹配应当让前缀的指针停留在第一个字符位置1
            //以p2为结尾的后缀没有匹配的前缀
            Next[p2+1]=1;//那么当它的下一位失配应当比较第一位
            //这里的含义很是特殊，由于第1位以前的串是空串，也即其实符合了Next数组的定义，空串的最长先后匹配是0
            //比较第一位也就意味着，第一位以前的空串的最大先后匹配是0
            //此处也表示最长先后匹配为0时,只能将指针回溯到第一位，也即第一位尚未匹配，等待匹配的状态
            p1=1;//回溯指针，p1悬浮在可能扩展的最后一个字符
            p2++;//考虑计算以p2+1结尾的最大匹配
        }
        else if(b[p1]==b[p2]){
            //当前缀和后缀有一个字符匹配
            Next[p2+1]=Next[p2]+1;//这个转移表示从1~Next[p2]-1的匹配串延拓到1~Next[p2]
            p1=Next[p2]+1;//p1要移动到当前可能扩展的最后一个字符
            p2++;//考虑计算以p2+1结尾的最大匹配
        }
        else{
            //第一次进入这个状态时，或者连续这个状态迭代的第一次，当前的p1其实就等于Next[p2]
            //Next[p2]表示，1~Next[p2]-1,b[Next[p2]-1]=b[p2-1];
            //p1=Next[p1]就等价于p1=Next[Next[p2]];
            //仍然能获得一个b[Next[Next[p2]]-1]=b[Next[p2]-1]=b[p2-1]的前缀
            p1=Next[p1];//不断迭代到长度递减前缀，符合匹配最后一个字符是b[p1-1]
        }
    }
}

而且值得咱们注意的是，

指针p1悬浮在的位置说明该位置状态不肯定，须要匹配一下确认

p1=minIndex-1时则表示，连第一个字符都匹配不到，应当走自环，让指针p1再回到minIndex,由于此时minIndex位置待匹配

长度为len的串，则长度为len的前缀和后缀必定相等就是它自己，此时最大匹配长度不计数，由于它是没有意义的

好比a,其实它有前缀a,后缀a但因为长度为len因此不计数

因此前缀指针p1,后缀指针p2,初始的时候p2-p1=1也即相邻，也就是说初始最小长度必定是2，一个在前一个在后

hdu1711求模式串在文本串中出现的最先位置

#include <iostream>
#include <cstdio>
#include <cstring>
using namespace std;
const int maxn=1e5+7;
const int N=1e6+7;
int Next[maxn],a[N],b[maxn],n,m;
void initNext(){
    memset(Next,0,sizeof(Next));
    Next[1]=0;
    int p1=0,p2=1;
    while(p2<=m){
        if(p1==0){
            //此时说明它连第一个字符都匹配不了，那么后续的匹配应当让前缀的指针停留在第一个字符位置1
            //以p2为结尾的后缀没有匹配的前缀
            Next[p2+1]=1;//那么当它的下一位失配应当比较第一位
            //这里的含义很是特殊，由于第1位以前的串是空串，也即其实符合了Next数组的定义，空串的最长先后匹配是0
            //比较第一位也就意味着，第一位以前的空串的最大先后匹配是0
            //此处也表示最长先后匹配为0时,只能将指针回溯到第一位，也即第一位尚未匹配，等待匹配的状态
            p1=1;//回溯指针，p1悬浮在可能扩展的最后一个字符
            p2++;//考虑计算以p2+1结尾的最大匹配
        }
        else if(b[p1]==b[p2]){
            //当前缀和后缀有一个字符匹配
            Next[p2+1]=Next[p2]+1;//这个转移表示从1~Next[p2]-1的匹配串延拓到1~Next[p2]
            p1=Next[p2]+1;//p1要移动到当前可能扩展的最后一个字符
            p2++;//考虑计算以p2+1结尾的最大匹配
        }
        else{
            //第一次进入这个状态时，或者连续这个状态迭代的第一次，当前的p1其实就等于Next[p2]
            //Next[p2]表示，1~Next[p2]-1,b[Next[p2]-1]=b[p2-1];
            //p1=Next[p1]就等价于p1=Next[Next[p2]];
            //仍然能获得一个b[Next[Next[p2]]-1]=b[Next[p2]-1]=b[p2-1]的前缀
            p1=Next[p1];//不断迭代到长度递减前缀，符合匹配最后一个字符是b[p1-1]
        }
    }
}
int Match(){
    int i,j;i=1;j=1;
    while(i<=n){
        if(j==0){
            i++;j++;
        }
        else if(a[i]==b[j]){
            i++;j++;
        }
        else j=Next[j];
        if(j==m+1) return i-j+1;
    }
    return -1;
}
int main(){
    int T;scanf("%d",&T);
    while(T--){
        scanf("%d%d",&n,&m);
        for(int i=1;i<=n;++i) scanf("%d",a+i);
        for(int i=1;i<=m;++i) scanf("%d",b+i);
        initNext();
        //for(int i=1;i<=m;++i) printf("nxt:%d,",Next[i]);printf("\n");
        printf("%d\n",Match());
    }
    return 0;
}

可是其实这个程序是不对的，Next[p2+1]=Next[p2]+1;这个转移不对

真正的转移是 Next[p2]=i+1;此时i才是若干次迭代后的Next[p2']

因此对于poj1961,上面的求法就WA了，能过数据只是数据太水

Period

Time Limit: 3000MS		Memory Limit: 30000K
Total Submissions: 17771		Accepted: 8562

Description

For each prefix of a given string S with N characters (each character has an ASCII code between 97 and 126, inclusive), we want to know whether the prefix is a periodic string. That is, for each i (2 <= i <= N) we want to know the largest K > 1 (if there is one) such that the prefix of S with length i can be written as A ^K ,that is A concatenated K times, for some string A. Of course, we also want to know the period K.

Input

The input consists of several test cases. Each test case consists of two lines. The first one contains N (2 <= N <= 1 000 000) – the size of the string S.The second line contains the string S. The input file ends with a line, having the
number zero on it.

Output

For each test case, output "Test case #" and the consecutive test case number on a single line; then, for each prefix with length i that has a period K > 1, output the prefix size i and the period K separated by a single space; the prefix sizes must be in increasing order. Print a blank line after each test case.

Sample Input

3
aaa
12
aabaabaabaab
0

Sample Output

Test case #1
2 2
3 3

Test case #2
2 2
6 2
9 3
12 4

Source

正确代码以下

#include <iostream>
#include <cstdio>
#include <cstring>
using namespace std;
const int maxn=1e6+7;
char s[maxn];int n,Next[maxn];

int main(){
    int cas=0;
    while(~scanf("%d",&n)){
        if(n!=0&&cas) printf("\n");
        if(n==0) break;
        printf("Test case #%d\n",++cas);
        scanf("%s",s+1);
        memset(Next,-1,sizeof(Next));
        Next[1]=0;int i=0,j=1;
        while(j<=n){
            if(i==0){
                Next[j+1]=1;//1 not 0
                i=1;j++;
                if(Next[j]!=-1&&j>=2&&j<=n&&j%(j-Next[j])==0&&s[j]==s[Next[j]]){
                    printf("%d %d\n",j,j/(j-Next[j]));
                }
            }
            else if(s[i]==s[j]){
                Next[j+1]=i+1;//
                i++;j++;
                if(Next[j]!=-1&&j>=2&&j<=n&&j%(j-Next[j])==0&&s[j]==s[Next[j]]){
                    printf("%d %d\n",j,j/(j-Next[j]));
                }
            }
            else {
                i=Next[i];//not j=Next[j]
            }
        }
    }
    return 0;
}

另外，有一个逻辑错，周期串不必定是2的倍数

Power Strings

Time Limit: 3000MS		Memory Limit: 65536K
Total Submissions: 48116		Accepted: 20030

Description

Given two strings a and b we define a*b to be their concatenation. For example, if a = "abc" and b = "def" then a*b = "abcdef". If we think of concatenation as multiplication, exponentiation by a non-negative integer is defined in the normal way: a^0 = "" (the empty string) and a^(n+1) = a*(a^n).

Input

Each test case is a line of input representing s, a string of printable characters. The length of s will be at least 1 and will not exceed 1 million characters. A line containing a period follows the last test case.

Output

For each s you should print the largest n such that s = a^n for some string a.

Sample Input

abcd
aaaa
ababab
.

Sample Output

1
4
3

Hint

This problem has huge input, use scanf instead of cin to avoid time limit exceed.

#include <iostream>
#include <cstring>
#include <cstdio>
using namespace std;
const int maxn=1e6+7;
char s[maxn];int Next[maxn];
int main(){
    while(~scanf("%s",s+1)){
        int len=strlen(s+1);
        if(len==1){
            if(s[1]=='.') break;
            printf("1\n");continue;
        }
        memset(Next,-1,sizeof(Next));
        Next[1]=0;int p1=0,p2=1;
        while(p2<=len){
            if(p1==0){
                Next[++p2]=1;
                p1=1;
            }
            else if(s[p1]==s[p2]){
                Next[++p2]=p1+1;
                p1++;
            }
            else p1=Next[p1];
        }
        if(Next[len]==1){
            printf("1\n");continue;
        }
        if((len%(len-Next[len])==0)&&s[len]==s[Next[len]]){
            printf("%d\n",len/(len-Next[len]));
        }    
        else printf("1\n");
    }
    return 0;
}

Oulipo

Time Limit: 1000MS		Memory Limit: 65536K
Total Submissions: 40122		Accepted: 16122

Description

The French author Georges Perec (1936–1982) once wrote a book, La disparition, without the letter 'e'. He was a member of the Oulipo group. A quote from the book:

Tout avait Pair normal, mais tout s’affirmait faux. Tout avait Fair normal, d’abord, puis surgissait l’inhumain, l’affolant. Il aurait voulu savoir où s’articulait l’association qui l’unissait au roman : stir son tapis, assaillant à tout instant son imagination, l’intuition d’un tabou, la vision d’un mal obscur, d’un quoi vacant, d’un non-dit : la vision, l’avision d’un oubli commandant tout, où s’abolissait la raison : tout avait l’air normal mais…

Perec would probably have scored high (or rather, low) in the following contest. People are asked to write a perhaps even meaningful text on some subject with as few occurrences of a given “word” as possible. Our task is to provide the jury with a program that counts these occurrences, in order to obtain a ranking of the competitors. These competitors often write very long texts with nonsense meaning; a sequence of 500,000 consecutive 'T's is not unusual. And they never use spaces.

So we want to quickly find out how often a word, i.e., a given string, occurs in a text. More formally: given the alphabet {'A', 'B', 'C', …, 'Z'} and two finite strings over that alphabet, a word W and a text T, count the number of occurrences of W in T. All the consecutive characters of W must exactly match consecutive characters of T. Occurrences may overlap.

Input

The first line of the input file contains a single number: the number of test cases to follow. Each test case has the following format:

One line with the word W, a string over {'A', 'B', 'C', …, 'Z'}, with 1 ≤ |W| ≤ 10,000 (here |W| denotes the length of the string W).
One line with the text T, a string over {'A', 'B', 'C', …, 'Z'}, with |W| ≤ |T| ≤ 1,000,000.

Output

For every test case in the input file, the output should contain a single number, on a single line: the number of occurrences of the word W in the text T.

Sample Input

3
BAPC
BAPC
AZA
AZAZAZA
VERDI
AVERDXIVYERDIAN

Sample Output

1
3
0

kmp统计子串出现的次数，注意当p1>t_len的时候，若是s串中没有空字符，那么咱们就至关于考虑，(注意Next数组算到maxIndex+1

T的最后一个空字符和S不匹配，那么说明T以前的字符和S都匹配了，直接走p1=Next[p1]是正确的，直接p1=1是错的，会少算答案

好比AZAZAZA，AZA，当第二个串跑到4的时候用Next能够跳到2，能够继续匹配中间的AZA，而直接回溯一，就只能直接算最后一个AZA

#include <iostream>
#include <cstdio>
#include <cstring>
using namespace std;
const int maxn=1e6+7;
char S[maxn],T[maxn];
int Next[maxn];
int main(){
    int t;scanf("%d",&t);
    while(t--){
        scanf("%s%s",T+1,S+1);
        memset(Next,-1,sizeof(Next));
        int s_len=strlen(S+1),t_len=strlen(T+1);
        Next[1]=0;int p1=0,p2=1;
        while(p2<=t_len){
            if(p1==0){
                Next[++p2]=1;p1=1;
            }
            else if(T[p1]==T[p2]){
                Next[++p2]=++p1;
            }
            else p1=Next[p1];
        }
        int ans=0;
        p1=p2=1;
        while(p2<=s_len){
            if(p1==0||T[p1]==S[p2]){
                p1++;p2++;
                if(p1>t_len){
                    ans++;//这里不用回溯p1指针
                }
            }
            else {
                p1=Next[p1];
            }
        }
        printf("%d\n",ans);
    }
    return 0;
}

Cyclic Nacklace

Time Limit: 2000/1000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Others)
Total Submission(s): 8622 Accepted Submission(s): 3707

Problem Description

CC always becomes very depressed at the end of this month, he has checked his credit card yesterday, without any surprise, there are only 99.9 yuan left. he is too distressed and thinking about how to tide over the last days. Being inspired by the entrepreneurial spirit of "HDU CakeMan", he wants to sell some little things to make money. Of course, this is not an easy task.

As Christmas is around the corner, Boys are busy in choosing christmas presents to send to their girlfriends. It is believed that chain bracelet is a good choice. However, Things are not always so simple, as is known to everyone, girl's fond of the colorful decoration to make bracelet appears vivid and lively, meanwhile they want to display their mature side as college students. after CC understands the girls demands, he intends to sell the chain bracelet called CharmBracelet. The CharmBracelet is made up with colorful pearls to show girls' lively, and the most important thing is that it must be connected by a cyclic chain which means the color of pearls are cyclic connected from the left to right. And the cyclic count must be more than one. If you connect the leftmost pearl and the rightmost pearl of such chain, you can make a CharmBracelet. Just like the pictrue below, this CharmBracelet's cycle is 9 and its cyclic count is 2:

Now CC has brought in some ordinary bracelet chains, he wants to buy minimum number of pearls to make CharmBracelets so that he can save more money. but when remaking the bracelet, he can only add color pearls to the left end and right end of the chain, that is to say, adding to the middle is forbidden.
CC is satisfied with his ideas and ask you for help.

Input

The first line of the input is a single integer T ( 0 < T <= 100 ) which means the number of test cases.
Each test case contains only one line describe the original ordinary chain to be remade. Each character in the string stands for one pearl and there are 26 kinds of pearls being described by 'a' ~'z' characters. The length of the string Len: ( 3 <= Len <= 100000 ).

Output

For each case, you are required to output the minimum count of pearls added to make a CharmBracelet.

Sample Input

3 aaa abca abcde

Sample Output

0 2 5

Author

possessor WC

注意输出答案时对于Next[len]=1的特判

abca,abce的区分

以及xyzabcabcqe这个是无周期的

只要前面出现了后缀，后面出现了前缀，那么咱们能够只补右面，直接从左往右找周期串，没必要把中间的周期串抠出来

好比exyzabcabcqe,qexyzabcabcqe,qexyzabcabcqex,xyzabcabcqex,xyzabcabcqexy等

以及周期串的长度必定是len-Next[len],注意len-Next[len]=1的特判和Next[len]=1的特判

#include <iostream>
#include <cstdio>
#include <cstring>
using namespace std;
const int maxn=1e5+7;
char s[maxn];int Next[maxn];
int main(){
    int T;scanf("%d",&T);
    while(T--){
        scanf("%s",s+1);
        memset(Next,-1,sizeof(Next));
        Next[1]=0;int p1=0,p2=1,len=strlen(s+1);
        while(p2<=len){
            if(p1==0){
                Next[++p2]=1;p1=1;
            }
            else if(s[p1]==s[p2]){
                Next[++p2]=++p1;
            }
            else p1=Next[p1];
        }
        int mod=len-Next[len];
        if(Next[len]==1){
            if(s[len]==s[1]) printf("%d\n",len-2);//周期串长度自己变成len-1,在减去已经有的结尾，len-1-1
            else           printf("%d\n",len);
            continue;
        }
        if(mod==1){
            if(s[len]==s[Next[len]]) printf("0\n");
            else                      printf("%d\n",len);
        }
        else{
            if(len%mod==0) printf("0\n");
            else            printf("%d\n",mod-len%mod);
        }
    }
    return 0;
}