[LeetCode] Short Encoding of Words 单词集的短编码

时间 2019-11-06

标签 leetcode short encoding words 词集编码栏目 Microsoft Office 繁體版

原文原文链接

Given a list of words, we may encode it by writing a reference string S and a list of indexes A.html

For example, if the list of words is ["time", "me", "bell"], we can write it as S = "time#bell#" and indexes = [0, 2, 5].数组

Then for each index, we will recover the word by reading from the reference string from that index until we reach a "#" character.post

What is the length of the shortest reference string S possible that encodes the given words?编码

Example:url

Input: words = 
Output: 10
Explanation: S = ].
["time", "me", "bell"]"time#bell#" and indexes = [0, 2, 5

Note:spa

1 <= words.length <= 2000.
1 <= words[i].length <= 7.
Each word has only lowercase letters.

这道题给了咱们一个单词数组，让咱们对其编码，不一样的单词之间加入#号，每一个单词的起点放在一个坐标数组内，终点就是#号，能合并的单词要进行合并，问输入字符串的最短长度。题意不难理解，难点在于如何合并单词，咱们观察题目的那个例子，me和time是可以合并的，只要标清楚其实位置，time的起始位置是0，me的起始位置是2，那么根据#号位置的不一样就能够顺利的取出me和time。须要注意的是，若是me换成im，或者tim的话，就不能合并了，由于咱们是要从起始位置到#号以前全部的字符都要取出来。搞清楚了这一点以后，咱们在接着观察，因为me是包含在time中的，因此咱们处理的顺序应该是先有time#，而后再看可否包含me，而不是先生成了me#以后再处理time，因此咱们能够得出结论，应该先处理长单词，那么就给单词数组按长度排序一下就行，本身重写一个comparator就行。而后咱们遍历数组，对于每一个单词，咱们都在编码字符串查找一下，若是没有的话，直接加上这个单词，再加一个#号，若是有的话，就能够获得出现的位置。好比在time#中查找me，获得found=2，而后咱们要验证该单词后面是否紧跟着一个#号，因此咱们直接访问found+word.size()这个位置，若是不是#号，说明不能合并，咱们仍是要加上这个单词和#号。最后返回编码字符串的长度便可，参见代码以下：code

解法一：htm

class Solution {
public:
    int minimumLengthEncoding(vector<string>& words) {
        string str = "";
        sort(words.begin(), words.end(), [](string& a, string& b){return a.size() > b.size();});
        for (string word : words) {
            int found = str.find(word);
            if (found == string::npos || str[found + word.size()] != '#') {
                str += word + "#";
            }
        }
        return str.size();
    }
};

咱们再来看一种不用自定义comparator的方法，根据以前的分析，咱们知道实际上是在找单词的后缀，好比me就是time的后缀。咱们但愿将能合并的单词排在一块儿，比较好处理，然后缀又很差排序。那么咱们就将其转为前缀，作法就是给每一个单词翻转一下，time变成emit，me变成em，这样咱们只要用默认的字母顺序排，就能够获得em，emit的顺序，那么能合并的单词就放到一块儿了，并且必定是当前的合并到后面一个，那么就好作不少了。咱们只要判读当前单词是不是紧跟着的单词的前缀，是的话就加0，不是的话就要加上当前单词的长度并再加1，多加的1是#号。判断前缀的方法很简单，直接在后面的单词中取相同长度的前缀比较就好了。因为咱们每次都要取下一个单词，为了防止越界，只处理到倒数第二个单词，那么就要把最后一个单词的长度加入结果res，并再加1便可，参见代码以下：blog

解法二：排序

class Solution {
public:
    int minimumLengthEncoding(vector<string>& words) {
        int res = 0, n = words.size();
        for (int i = 0; i < n; ++i) reverse(words[i].begin(), words[i].end());
        sort(words.begin(), words.end());
        for (int i = 0; i < n - 1; ++i) {
            res += (words[i] == words[i + 1].substr(0, words[i].size())) ? 0  : words[i].size() + 1;
        }
        return res + words.back().size() + 1;
    }
};

接下来的这种方法也很巧妙，用了一个HashSet，将全部的单词先放到这个HashSet中。原理是对于每一个单词，咱们遍历其全部的后缀，好比time，那么就遍历ime，me，e，而后看HashSet中是否存在这些后缀，有的话就删掉，那么HashSet中的me就会被删掉，这样保证了留下来的单词不可能再合并了，最后再加上每一个单词的长度到结果res，而且同时要加上#号的长度，参见代码以下：

解法三：

class Solution {
public:
    int minimumLengthEncoding(vector<string>& words) {
        int res = 0;
        unordered_set<string> st(words.begin(), words.end());
        for (string word : st) {
            for (int i = 1; i < word.size(); ++i) {
                st.erase(word.substr(i));
            }
        }
        for (string word : st) res += word.size() + 1;
        return res;
    }
};

参考资料：

https://leetcode.com/problems/short-encoding-of-words/

https://leetcode.com/problems/short-encoding-of-words/discuss/125825/Easy-to-understand-Java-solution

https://leetcode.com/problems/short-encoding-of-words/discuss/125822/C%2B%2B-4-lines-reverse-and-sort

https://leetcode.com/problems/short-encoding-of-words/discuss/125811/C%2B%2BJavaPython-Easy-Understood-Solution-with-Explanation

LeetCode All in One 题目讲解汇总(持续更新中...)