[LeetCode] 839. Similar String Groups 类似字符串组

时间 2019-11-05

标签 leetcode similar string groups 类似字符串繁體版

原文原文链接

Two strings X and Y are similar if we can swap two letters (in different positions) of X, so that it equals Y.html

For example, "tars" and "rats" are similar (swapping at positions 0 and 2), and "rats" and "arts" are similar, but "star" is not similar to "tars", "rats", or "arts".git

Together, these form two connected groups by similarity: {"tars", "rats", "arts"} and {"star"}. Notice that "tars" and "arts" are in the same group even though they are not similar. Formally, each group is such that a word is in the group if and only if it is similar to at least one other word in the group.github

We are given a list A of strings. Every string in A is an anagram of every other string in A. How many groups are there?数组

Example 1:app

Input: ["tars","rats","arts","star"]
Output: 2

Note:函数

A.length <= 2000
A[i].length <= 1000
A.length * A[i].length <= 20000
All words in A consist of lowercase letters only.
All words in A have the same length and are anagrams of each other.
The judging time limit has been increased for this question.

这道题定义了字符串之间的一种类似关系，说是对于字符串X和Y，交换X中两个不一样位置上的字符，若能够获得Y的话，就说明X和Y是类似的。如今给了咱们一个字符串数组，要将类似的字符串放到一个群组里，这里同一个群组里的字符串没必要任意两个都类似，而是只要能经过某些结点最终连着就好了，有点像连通图的感受，将全部连通的结点算做一个群组，问整个数组能够分为多少个群组。因为这道题的本质就是求连通图求群组个数，既然是图，考察的就是遍历啦，就有 DFS 和 BFS 的解法。先来看 DFS 的解法，虽然说本质是图的问题，但并非真正的图，没有邻接链表啥的，这里判断两个结点是否相连其实就是判断是否类似。因此能够写一个判断是否类似的子函数，实现起来也很是的简单，只要按位置对比字符，若不相等则 diff 自增1，若 diff 大于2了直接返回 false，由于只有 diff 正好等于2或者0的时候才类似。题目中说了字符串之间都是异构词，说明字符的种类个数都同样，只是顺序不一样，就不可能出现奇数的 diff，而两个字符串彻底相等时也是知足要求的，是类似的。下面来进行 DFS 遍历，用一个 HashSet 来记录遍历过的字符串，对于遍历到的字符串，若已经在 HashSet 中存在了，直接跳过，不然结果 res 自增1，并调用递归函数。这里递归函数的做用是找出全部类似的字符串，首先仍是判断当前字符串 str 是否访问过，是的话直接返回，不然加入 HashSet 中。而后再遍历一遍原字符串数组，每个遍历到的字符串 word 都和 str 检测是否类似，类似的话就对这个 word 调用递归函数，这样就能够找出全部类似的字符串啦，参见代码以下：this

解法一：code

class Solution {
public:
    int numSimilarGroups(vector<string>& A) {
        int res = 0, n = A.size();
        unordered_set<string> visited;
        for (string str : A) {
            if (visited.count(str)) continue;
            ++res;
            helper(A, str, visited);
        }
        return res;
    }
    void helper(vector<string>& A, string& str, unordered_set<string>& visited) {
        if (visited.count(str)) return;
        visited.insert(str);
        for (string word : A) {
            if (isSimilar(word, str)) {
                helper(A, word, visited);
            }
        }
    }
    bool isSimilar(string& str1, string& str2) {
        for (int i = 0, cnt = 0; i < str1.size(); ++i) {
            if (str1[i] == str2[i]) continue;
            if (++cnt > 2) return false;
        }
        return true;
    }
};

咱们也可使用 BFS 遍从来作，用一个 bool 型数组来标记访问过的单词，同时用队列 queue 来辅助计算。遍历全部的单词，假如已经访问过了，则直接跳过，不然就要标记为 true，而后结果 res 自增1，这里跟上面 DFS 的解法原理同样，要一次找完和当前结点相连的全部结点，只不过这里用了迭代的 BFS 的写法。先将当前字符串加入队列 queue 中，而后进行 while 循环，取出队首字符串，再遍历一遍全部字符串，遇到访问过的就跳过，而后统计每一个字符串和队首字符串之间的不一样字符个数，假如最终 diff 为0的话，说明是同样的，此时不加入队列，可是要标记这个字符串为 true；若最终 diff 为2，说明是类似的，除了要标记字符串为 true，还要将其加入队列进行下一轮查找，参见代码以下：orm

解法二：htm

class Solution {
public:
    int numSimilarGroups(vector<string>& A) {
        int res = 0, n = A.size();
        vector<bool> visited(n);
        queue<string> q;
        for (int i = 0; i < n; ++i) {
            if (visited[i]) continue;
            visited[i] = true;
            ++res;
            q.push(A[i]);
            while (!q.empty()) {
                string t = q.front(); q.pop();
                for (int j = 0; j < n; ++j) {
                    if (visited[j]) continue;
                    int diff = 0;
                    for (int k = 0; k < A[j].size(); ++k) {
                        if (t[k] == A[j][k]) continue;
                        if (++diff > 2) break;
                    }
                    if (diff == 0) visited[j] = true;
                    if (diff == 2) {
                        visited[j] = true;
                        q.push(A[j]);
                    }
                }
            }
        }
        return res;
    }
};

对于这种群组归类问题，很适合使用联合查找 Union Find 来作，LeetCode 中也有其余用到这个思路的题目，好比 Friend Circles，Accounts Merge，Redundant Connection II，Redundant Connection，Number of Islands II，Graph Valid Tree，和 Number of Connected Components in an Undirected Graph。都是要用一个 root 数组，每一个点开始初始化为不一样的值，若是两个点属于相同的组，就将其中一个点的 root 值赋值为另外一个点的位置，这样只要是相同组里的两点，经过 getRoot 函数获得相同的值。因此这里对于每一个结点 A[i]，都遍历前面全部结点 A[j]，假如两者不类似，直接跳过；不然将 A[j] 结点的 root 值更新为i，这样全部相连的结点的 root 值就相同了，一个群组中只有一个结点的 root 值会保留为其的初始值，因此最后只要统计到底还有多少个结点的 root 值仍是初始值，就知道有多少个群组了，参见代码以下：

解法三：

class Solution {
public:
    int numSimilarGroups(vector<string>& A) {
        int res = 0, n = A.size();
        vector<int> root(n);
        for (int i = 0; i < n; ++i) root[i] = i;
        for (int i = 1; i < n; ++i) {
            for (int j = 0; j < i; ++j) {
                if (!isSimilar(A[i], A[j])) continue;
                root[getRoot(root, j)] = i;
            }
        }
        for (int i = 0; i < n; ++i) {
            if (root[i] == i) ++res;
        }
        return res;
    }
    int getRoot(vector<int>& root, int i) {
        return (root[i] == i) ? i : getRoot(root, root[i]);
    }
    bool isSimilar(string& str1, string& str2) {
        for (int i = 0, cnt = 0; i < str1.size(); ++i) {
            if (str1[i] == str2[i]) continue;
            if (++cnt > 2) return false;
        }
        return true;
    }
};

Github 同步地址:

https://github.com/grandyang/leetcode/issues/839

相似题目：

Friend Circles

Accounts Merge

Redundant Connection II

Redundant Connection

Number of Islands II

Graph Valid Tree

Number of Connected Components in an Undirected Graph

参考资料：

https://leetcode.com/problems/similar-string-groups/

https://leetcode.com/problems/similar-string-groups/discuss/200934/My-Union-Find-Java-Solution

https://leetcode.com/problems/similar-string-groups/discuss/282212/Java-two-solutions-BFS-and-Union-Find

https://leetcode.com/problems/similar-string-groups/discuss/296251/DFS-with-Explanation-(Also-Highlighting-Misleading-Strategy)

LeetCode All in One 题目讲解汇总(持续更新中...)