实操案例：字符串哈希表操做

时间 2020-07-24

标签案例字符串哈希繁體版

原文原文链接

摘要：当遇到C语言库没有字符串哈希表的时候，该如何进行操做。

有考C语言可信编程认证的同事常常会问到，C语言库没有字符串哈希表操做，那考试遇到了怎么办。虽然历次考试的题目中没有必需要用到C语言哈希表的题目（至少我都能用常规C作出来），可是还须要防患未然，这里给出一道有表明性的题目，能够尝试作作看：leetcode-cn.com/problems/su…html

给定一个字符串 s 和一些长度相同的单词 words。找出 s 中刚好能够由 words 中全部单词串联造成的子串的起始位置。
注意子串要与 words 中的单词彻底匹配，中间不能有其余字符，但不须要考虑 words 中单词串联的顺序。

示例：
输入：
  s = "barfoothefoobarman",
  words = ["foo","bar"]
输出：[0,9]
解释：
从索引 0 和 9 开始的子串分别是 "barfoo" 和 "foobar" 。
输出的顺序不重要, [9,0] 也是有效答案。复制代码

这题不考虑编程语言的话，用哈希表会比较简单，那要是用C语言的话，能够本身撸个哈希表用，对付这类题目仍是绰绰有余的。算法

思路的话参考leetcode-cn.com/problems/su…中的解法二，这里只讲下怎么最简单构造一个哈希表。编程

首先是选取哈希函数，这里我用的是djb2算法，参考www.cse.yorku.ca/~oz/hash.ht…，碰撞率至关低，分布平衡，实现也很简单，就两三行代码，记住关键数字(5381和33)。数组

If you just want to have a good hash function, and cannot wait, djb2 is one of the best string hash functions i know. it has excellent distribution and speed on many different sets of keys and table sizes.

Language- 代码bash

unsigned long
hash(unsigned char *str)
{
    unsigned long hash = 5381;

    int c;

    while (c = *str++)

        hash = ((hash << 5) + hash) + c; /* hash * 33 + c */

    return hash;

}复制代码

有了字符串哈希函数，就可以将大串字符串转换成数字，数字进而能够做为数组的下标（key）存储信息。那么哈希表的大小怎么取呢？通常大小要大于存储的数据个数，好比最多100个数据，存到哈希表的话大小确定要大于100才行。对于这题而言，没有明确告诉你单词的最大个数，只能估值了，这里通过几轮提交测试，获得哈希表大小与经过用例个数的关系，说明这道题目最多的单词数可能在300左右，平均个数<50个吧：编程语言

5 -> 110/173
10 -> 143/173
50 -> 170/173
100 -> 170/173
300 -> 172/173
400 -> 173/173复制代码

这里给出个人解答：函数

C 代码测试

// 字符串最大值，hash表大小，估值和实际数据个数有关

#define MAXWORDCOUNT 1000


static int wordCount[MAXWORDCOUNT];

static int currWordCount[MAXWORDCOUNT];



// ref: http://www.cse.yorku.ca/~oz/hash.html


unsigned long DJBHash(const char* s, int len) {

    unsigned long hash = 5381; // 经验值，hash冲突几率低，分布平衡


    while (len--) {

        hash = (((hash << 5) + hash) + *(s++)) % MAXWORDCOUNT; /* hash * 33 + c */


    }

    return hash;


}



int* findSubstring(char * s, char ** words, int wordsSize, int* returnSize){

    memset(wordCount, 0, sizeof(wordCount));


    *returnSize = 0;



    const int kSLen = strlen(s);

    if (kSLen == 0 || wordsSize == 0) return NULL;



    const int kWordLen = strlen(words[0]);


    // 将单词数量存到哈希表中，key: word, value: 单词数量

    for (int i = 0; i < wordsSize; ++i)


        ++wordCount[DJBHash(words[i], kWordLen)];



    int *result = malloc(sizeof(int) * kSLen);

    for (int i = 0; i < kWordLen; ++i) {


        for (int j = i; j + kWordLen * wordsSize <= kSLen; j += kWordLen) {

            // 统计当前窗口的单词数量


            for (int k = (j == i ? 0 : wordsSize - 1); k < wordsSize; ++k)

                ++currWordCount[DJBHash(s + j + k * kWordLen, kWordLen)];



            // 判断两个哈希表是否相等，即窗口中的单词是否和给定词典彻底匹配


            if (memcmp(wordCount, currWordCount, sizeof(wordCount)) == 0)

                result[(*returnSize)++] = j;



            --currWordCount[DJBHash(s + j, kWordLen)];


        }

        // 哈希表清零操做


        memset(currWordCount, 0, sizeof(currWordCount));

    }


    return result;

}复制代码

点击关注，第一时间了解华为云新鲜技术~ui